Sunday, December 19, 2021

Optimum Number of Threads

The latest release of JaamSim (2021-06) has the ability to execute multiple simulation runs in parallel on a multi-core computer. The number of parallel runs is specified by the 'NumberOfThreads' input for the Simulation object. Although there is no upper limit on this input, specifying too many threads will result in excess context switching which reduces execution speed. So, how does one choose the best value for this input and how much faster will a set of simulation runs be completed?

The 'Run Progress' pop-up, which appears when the runs are started, provides a way to answer these questions.


In addition to showing the scenarios and replications that being executed on each thread, the number in red at the bottom left corner shows the estimated rate at which simulation runs are being completed. This value is updated every 5 seconds based on the amount of progress made since the previous update. The optimum number of threads is the one that generates the greatest number of runs executed per hour.

The only way to be sure that the right number of thread has been chosen is to experiment with the NumberOfThreads inputs for the model in question. In most cases, however, the optimum number will be the same for all models and determined by the type of CPU installed in the computer.  The following analysis was performed for two computers: a laptop computer and a desktop computer, both with Intel i7 processors that have four cores and hyper-threading. Potentially, these processors could provide up to eight threads (4 cores x 2 for hyper-threading).

The model used for the analysis was the 'Factory Example' model, which can be found by clicking Help > Examples > Factory Model. After setting the 'RunDuration' input to 30 years and turning 'Real Time' model off, the model was executed for a series of runs with various inputs for NumberOfThreads. Before taking any measurements, the model was allowed to run for one minute to allow just-in-time compilation to be performed by the Java virtual machine. All runs were performed with no other applications opened and with JaamSim's view and tool windows closed. The runs per hour value shown in the Run Progress pop-up was recorded when the first thread reached 50% complete. The process was repeated three times for each NumberOfThreads input and the average runs per hour was calculated.

The following graph shows the ratio of the runs per hour for a given NumberOfThreads input divided by the runs per hour value for one thread.


The graph shows that a NumberOfThreads input of 3 gave the best performance for the laptop computer and very nearly the best performance for the desktop computer. The poorer performance of the laptop computer with larger NumberOfThreads inputs is likely due to its limited cooling capacity. Note that the provision of hyper-threading on the Intel CPU did not allow a larger value to be used for NumberOfThreads.

The optimum run execution speed was about 2.2 - 2.5 times the rate for a single thread, which is about  73 - 83% of the potential processing power for three threads.

Although no tests were performed with CPUs that have more than four cores, it seems reasonable to guess that the following two formulae will hold approximately for other multi-core processors:

Optimum Number of Threads = (Number of Cores) - 1

Optimum Run Execution Speed = (0.8)(Optimum Number of Threads)

In some ways, it is disappointing that only three threads can be used for a CPU that is rated at eight threads. Nevertheless, the ability to execute a set of simulation runs 2.2 - 2.5 times faster is still a very satisfactory outcome for this new feature. Even bigger benefits should be available with newer CPUs that can have as many as 64 cores.