A GEMPACK user would expect that running the same simulation on (a) a desktop PC at work, and (b) a laptop at home, would yield the same results. However, in some cases results from the two simulations may be very slightly different. Usually the differences would be too small to notice. Yet in rare cases a simulation that seems to run fine on one PC crashes on another PC. This is particularly annoying if you share a simulation with a colleague, who then tells you that the simulation will not run on their PC, or does run but gives quite different results!
We explain below how results could differ between PCs, and how to combat the problem. The key take-away is that if you notice the problem, there is nearly always something wrong with the original simulation.
The most obvious explanation is human error. Bob ran a simulation and gave it to Susan for her to run. But due to some mixup, the simulation that Susan ran differed in model, closure, shocks, solution method or initial data files. Therefore Susan got different results. Clearly the first task is to check that the 2 PCs really were running the same simulation.
In our discussion below we focus on the case where a TABLO-generated EXE file created by Bob gives different results on Susan's PC. Another possibility is that Bob used a TABLO-generated EXE while Susan used GEMSIM. That might well lead to results that differed very slightly. Or, Susan might have used Bob's TAB file to recreate a TABLO-generated EXE file. If she used a different Fortran compiler or a different GEMPACK release, we might expect that her results would differ very slightly from Bob's. The very small differences arise because different compilers or GEMPACK releases may choose to evaluate arithmetic expressions in different ways. In that case, the sum of A+B+C may not be identical to the sum of C+A+B.
We return to the case where the TABLO-generated EXE file (and HAR, CMF files) created by Bob give different results on Susan's PC. These differences can arise from 'multithreading'. Modern CPUs contain several processing cores, and many programs are able to launch multiple threads (sub-tasks) which could run simultaneously on different cores, so speeding up computations [See footnote].
GEMPACK uses multithreading to speed up its linear solve (LU) phase. The linear solve requires large matrix multiplications. To speed up the multiplication the matrices are chopped up into parts called 'tiles' and multiplication is done with the tiles using several threads running in parallel. Then the results from the tiled multiplication are added to give the result. By default, the number of threads depends on how many cores the CPU has. As the threads finish doing each of the tile calculations, results are added together. Because computer arithmetic is not completely accurate, the final sum may differ very slightly according to how many threads are used. Ideally simulation results will scarcely be effected by these small changes during the solution procedure.
For the two PCs to generate identical results, we need to ensure that they both use the same number of threads. GEMPACK uses the OpenMP framework to control multithreading. By default, OpenMP uses a number of threads equal to the number of logical processors made available by the CPU. The Help...About/Diagnostics command in TABmate or ViewHAR can be used to generate a diagnostic file. Section 2 of this file tells the number of 'logical CPUs' available to Windows, which is really the number of 'logical processors'. Likely this number differs between Bob's and Susan's PCs. Bob might have seen:
SECTION 2: Operating system, hardware, session ................. Windows detected 1 socket with 4 cores and 8 logical CPUs
The simulation log file tells how many threads were available to the simulation. Near the top of the log file look for:
BLAS library: OpenBlas OPENMP number of threads: 8
The 'OPENMP number of threads' is the number of threads used by GEMPACK and is by default equal to the number of logical processors. We might see that Bob's PC has 8 logical processors while Susan's has 4. That could be the reason that they generate different results.
For the two PCs to generate identical results, we need to ensure that they both use 4 (the minimum of 8 and 4) or fewer threads. To do that, set the OMP_NUM_THREADS environment variable on Bob's PC to 4. One way to do this would be to open a CMD prompt on Bob's PC, and type:
setx OMP_NUM_THREADS 4
and then reboot the PC.
Another way would be to first close all CMD prompts and running GEMPACK programs, then use the Windows GUI (again on Bob's PC). Use the Windows key and search for "Edit environment variables for your account" and select the best match:
then create or edit the OMP_NUM_THREADS environment variable so that it has value 4. The picture below comes from Windows 11 -- your OS may show slightly different windows.
Next we would rerun the simulation on Bob's PC, and check that near the top of log files we saw:
OPENMP number of threads: 4
That would mean that Bob's simulation used the same number of threads (4) as Susan's simulation. In all cases known to GEMPACK support, this will be enough to ensure that the same results are generated on both PCs. However, as explained below, it may well be that the results are inaccurate or suspect!
We have explained above how running the same simulation (with the same TABLO-generated EXE file) can give slightly different results on two different PCs. Other scenarios where Bob and Susan could get different results are:
Experience at GEMPACK support is that for all well-behaved simulations, the differences are too small to notice. By 'well-behaved' we mean:
When differences (between 2 PCs) in results are noticed it is usually because the simulation was not well-behaved. For example, it may be that the original simulation was not solved accurately enough, or that endogenous values were extremely sensitive to small changes in the shocks. Some other suggestions are on this page. In short, if the problems in the simulation were identified and fixed, a simulation will give very similar results on different PCs or with different releases of GEMPACK.
Another reason why two PCs might generate different results is described here. Again the problem is addressed by setting an environment variable. To minimize numerical differences between two PCs, you could set both OPENBLAS_CORETYPE and OMP_NUM_THREADS environment variables to the same values on both PCs.
Your CPU (or socket) contains several physical cores. But usually 'hyperthreading' is enabled in the BIOS so that the operating system interacts with 'logical processors' which are more numerous than the physical cores. In the past the number of logical processors has usually been twice the number of physical cores. However recent Intel CPUs contain both P (performance) and E (energy-efficient but slower) physical cores. Often, only the P cores allow hyperthreading. So an 8-core CPU containing 4 E and 4 P cores offers 12 logical processors. By default, GEMPACK (via OpenMP) uses a number of threads equal to the number of logical processors. However, the OMP_NUM_THREADS environment variable can be used to override this default.
Hyperthreading sometimes harms performance and has been implicated in security vulnerabilities. For these reasons it may be disabled in the BIOS -- in that case the number of logical processors = number of physical cores.
Related pages:
Why does my simulation fail in GEMPACK 12, although it worked OK in earlier GEMPACK?
GEMPACK running slower on new PC?