Advice below applies to users of GEMPACK Releases 12 and 12.1. If you are running an earlier version of GEMPACK, the best way to speed up simulations is to upgrade (see this chart).
In late 2023 GEMPACK support became aware of a few cases where users started using a newer and better PC, but noticed that GEMPACK simulations took longer than on their old PC. We traced the problem to the CPU detection methods used by the OPENBLAS routines included in GEMPACK. We explain below how to diagnose and fix the problem.
During the last 30 years, Intel and AMD have constantly added new features (or instruction sets) to their CPUs, with names like MMX, SSE, SSE2, SSE4.1, AVX2 etc. Some programs will run much faster if they are able to use the newer instruction sets.
Release 12.0 and 12.1 Executable-image and Source-code GEMPACK (if using the GFortran compiler) use the OpenBLAS subroutines to speed up the linear solve (LU) phase. OpenBLAS contains many versions of each subroutine, each optimized for particular instruction sets. At runtime, OPENBLAS determines what type of CPU is used, and which instruction sets are supported. It uses this information to choose the fastest version of each subroutine.
Each CPU belongs to a 'family'; these have names like Katmai, Prescott, Athlon, Opteron, Sandybridge, Haswell, Zen, SkylakeX, and so on. When detecting CPU capabilities, OPENBLAS first tries to identify the CPU family, using a long list of CPU types known to the OPENBLAS developers at the time (your version of) the routines were developed. If it cannot recognize what family the CPU belongs to, it assumes the family is 'Katmai' -- the family name of the Pentium III CPU introduced in 1999. The Pentium III supported only a limited number of instruction sets, which have all been supported by nearly all subsequent CPUs.
Therefore, if you obtained GEMPACK in 2020 and install it on your new PC in 2024, it is possible your new PC has a CPU unknown to OPENBLAS in 2020 or previously. Then OPENBLAS will default to using the simpler, slower Pentium III instructions -- resulting in longer simulation times.
First, we need to find out to which family OPENBLAS has assigned your CPU. Download the program
tellmeopenblas.exe
and run it from the command line (ie, in a CMD prompt). tellmeopenblas should write out the CPU
family it has identified -- a name like Haswell or SkylakeX. But if it reports 'Katmai', your (older) OPENBLAS will fall back to using the slower
Pentium III instructions. Fortunately there is a way to override this behaviour. If tellmeopenblas reports 'Katmai',
you should follow the instructions below.
We can force OPENBLAS to use particular instruction sets by creating the OPENBLAS_CORETYPE environment variable and setting it to the name of a particular CPU family. We have found that setting OPENBLAS_CORETYPE to 'Haswell':
One way to do this would be to open a CMD prompt and type:
setx OPENBLAS_CORETYPE Haswell
and then reboot the PC.
Another way would be to first close all CMD prompts and running GEMPACK programs, then use the Windows GUI. Use the Windows key and search for "Edit environment variables for your account" and select the best match:
then create or edit the OPENBLAS_CORETYPE environment variable so that it has value 'Haswell'. The picture below comes from Windows 11 -- your OS may show slightly different windows.
After doing this you could run tellmeopenblas again -- this time it should report 'Haswell'.
Then you could rerun your simulation -- it may well run quicker!
The coretype identified by OPENBLAS will be shown near the top of GEMPACK log files—so there will be no need to run tellmeopenblas. A later version of the OPENBLAS library will be used, which is more likely to recognize modern CPUs.
If you run the same simulation on two different PCs with different CPUs, it is possible that the two CPUs support different instruction sets. In that case OPENBLAS may choose different versions of various subroutines, with the choice tailored to the current CPU. The different versions of a subroutine (say, to sum a vector) should yield results that are extremely similar, yet not necesarily identical. Therefore, the same simulation could yield slightly different results on two different PCs. Possibly, the problem might be reduced by setting the OPENBLAS_CORETYPE environment variable to 'Haswell' on both PCs.
Another reason why two PCs might generate different results is described here. Again the problem is addressed by setting an environment variable. To minimize numerical differences between two PCs, you could set both OPENBLAS_CORETYPE and OMP_NUM_THREADS environment variables to the same values on both PCs.
Related pages:
Why does my simulation fail in GEMPACK 12, although it worked OK in earlier GEMPACK?
How can same simulation give different results on another PC?