Sunday, November 30, 2014

The Fastest Simulation Software

One of my goals for 2014 was to perform the execution speed benchmarks described in my previous posts on a wide selection of mainstream simulation software packages. The software packages were compared by measuring the execution times for three basic activities:
  • executing a delay block
  • creating and destroying an entity
  • seizing and releasing a resource
Note that in previous posts, the third benchmark was the time to execute a "simple" block, which was taken to be one-half of the time to seize and release a resource. The new benchmark avoids this inference and presents the quantity that was actually measured.

Execution times were measured by counting the number of entities processed in 60 seconds measured with a stop watch. The posted results are the averages of three runs for each benchmark. All the runs were performed on the same laptop computer with a Second Generation ("Sandybridge") Core i5 processor running at 2.5 GHz. To make the results less machine specific, execution times were converted to clock cycles.

The following bar graph presents the results of the comparison.

Previous versions of this graph were labelled as "preliminary" to provide a chance for the vendors or other interested parties to improve the benchmark models for the individual software packages. Now that I have corresponded with most of the vendors and incorporated their suggestions, these results can be considered to be final for the specified version of each software package.

Revision 1: New results for Arena. Selecting the "Run > Run Control > Batch Run (No Animation)" option speeds up the benchmarks by a factor of ten. My thanks to Jon Santavy and Alexandre Ouellet for pointing this out.

Revision 2: New results for SIMUL8. Setting travel times between objects to zero resulted in a large improvement for the first and third benchmarks. The default setting for SIMUL8 is to assign a travel time in proportion to the distance between work centres, which increased the computational effort for these two benchmarks. The setting can be changed to zero travel time by selecting File > Preferences > Distance. My thanks to Sander Vermeulen for auditing and correcting the benchmark models.

Revision 3: New results for FlexSim. Execution speed was increased by approximately 10% by closing the “Model View (3D)” window during the run. The seize/release resource benchmark was also added to the result. After gaining more experience with FlexSim, it became clear that the Processor and Operator objects used for the first and third benchmarks are more complex objects than the simple seize, release, resource, and delay blocks that these two benchmarks are intended to evaluate. Since each object performs a series of actions to process an incoming entity, rather than just a single action, the results for the two benchmarks cannot be compared on a like-to-like basis with the other software packages. My thanks to Bill Nordgren for his help with the benchmarks.

Revision 4: New results for ExtendSim. Execution speeds were increased significantly by replacing the "Activity" blocks in Benchmarks 1 and 3 with "Workstation" blocks. The Workstation block is faster because it supports less functionality (pre-emption, shut-downs, and state statistics) than the Activity block, decreasing its overhead. It may be possible for ExtendSim users to increase execution speed further by creating a customized Activity block with any unneeded functionality stripped from the ModL code. My thanks to Peter Tag his guidance on ExtendSim.

Revision 5: New results for Simio. The entity creation and destruction benchmark was revised to use tokens instead of entities. All three benchmarks are now token-based. Tokens were used for the benchmark because they provide the same capabilities as the basic entity provided by some of the other simulation packages. The corresponding times for Simio's entity-based benchmarks are many times longer than the token-based ones. My thanks to David Sturrock preparing the new benchmark models.

Revision 6: New results for Arena. All three benchmark models were revised to use elements instead of modules to avoid unnecessary overhead. It is common practice for Arena modellers to avoid modules in large models where execution speed is an important factor. The module-based benchmark times are about 50% longer than the element-based times. My thanks to Alexandre Ouellet for preparing the new benchmark models.

Revision 7: Results for Simio 7.119. The latest release of Simio shows a significant improvement in execution speed for the seizing and releasing a resource (using tokens). Processing time was reduced from 21,000 clock cycles for release 7.114 to 4,300 clock cycles for release 7.119.

I should caution readers not to put very much importance on differences in execution speeds of less than a factor of two. Ease of use and suitability for the system to be modelled will always be the primary criteria for selecting simulation software. Execution speed is only important if it is a significant time saver for the analyst (impacting ease of use) or if it is required for the model to be tractable (impacting suitability for the system to be modelled). For some systems, even very slow software will be quite fast enough.

To put the execution times in perspective, consider the number of times an activity can be performed in one minute of processing time on a 2.5 GHz laptop computer. An activity requiring 150 clock cycles can be executed one billion times in one minute. One requiring 1,500 clock cycles can be executed 100 million times in one minute. Even one requiring 150,000 clock cycles can be executed one million times in one minute. In most cases, the fastest benchmark times measured by this exercise will be important only for very large models that require hundreds of millions of blocks to be executed over the course of each simulation run.

SLX holds the crown as the fastest of the simulation packages with times of 60, 110, and 230 clock cycles, however, it differs from the others in that it is strictly a programming language and lacks any drag & drop capabilities. JaamSim is the fastest of the simulation packages that include a drag & drop interface, with times of 240, 530, and 2,400 clock cycles. Arena, SIMUL8, AnyLogic, and Simio turn in solid results in the 1,000 - 10,000 clock cycle range. ExtendSim is significantly slower, with one benchmark time that exceeds 20,000 clock cycles. The results for FlexSim are not directly comparable to the other software packages and are provided for reference only (see the notes for Revision 3).

Sunday, September 7, 2014

Another Big Performance Increase for Release 2014-36

Faster Event Processing

Harvey Harrison has done it again -- event scheduling and processing is about 50% faster in this week's release (2014-36). On my laptop, the benchmark runs at nearly 8 million events/second compared to 5 million last week. It wasn't many months ago when I was excited about 2 million events/second. The following graph shows the full results for the benchmark.

SLX Software

A new entry on the above graph is the result for SLX from Wolverine Software. The developer, Jim Henriksen, had sent me this result several months ago along with the input file. The results for other two benchmarks were described in a post Jim made to the LinkedIn group for the Society for Modeling & Simulation International a few weeks ago. You can read his post here

Ideally, I should re-run the SLX benchmarks myself, but have not found the time to explore SLX enough to do so yet. In the interest of keeping things moving, the results Jim provided are shown in all the latest graphs.

It is fair to say that SLX sets the gold standard for execution speed. This is one of its key features, and SLX achieves this status by being the only simulation software to provide a compiler for its proprietary programming language. Somewhat more effort is required to build an SLX model -- there is no drag and drop model building environment -- but the pay-off is the shortest possible execution time.

Concluding Remarks

With Harvey's latest improvements, JaamSim processes events about 6 times faster than Simio, but is still about 2.7 times slower than SLX. JaamSim's event processing code is unlikely to get any faster -- all reasonable optimizations have been made already -- so we will have to accept that SLX is faster in this area. To be completely honest, we were pleasantly surprised to have got this close to SLX.

To put the event processing speed differences in in perspective, consider a model that requires one billion events to be executed. On my 2.5 GHz laptop, the event processing portion of the simulation run would total 13 minutes in Simio, 2.1 minutes in JaamSim, and 0.8 minutes in SLX. For this number of events, the difference between SLX and JaamSim amounts to only 1.3 minutes out of what is likely to be a much longer total duration for the run. Nearly 10 billion events would be required for the difference between SLX and JaamSim to reach 10 minutes. Beyond 10 billion events, SLX is likely to have a significant advantage over JaamSim. Below 1 billion events, the difference is likely to be insignificant.

Now that we have good performance for event processing, our next objective is to bring JaamSim's event creation/destruction time into line with the other benchmarks. SLX achieves its excellent result for this benchmark by providing an internal entity pool, so that used entities can be recycled instead of being destroyed. Up to now, we have shied away from an entity pool for JaamSim, preferring to focus our attention on making entity creation more efficient. Moving forward, we may have to reconsider various ways to implement an entity pool.

Thursday, August 28, 2014

Huge Performance Increase for Release 2014-35

Some programming magic by Harvey Harrison and Matt Chudleigh has resulted in a major increase in JaamSim's execution speed for release 2014-35. The following graph show the latest result for Benchmark 1 compared to those in my previous posts.

Benchmark 1 - Event Scheduling and Processing

Event Scheduling and Processing
The bottom line on the graph is the result for JaamSim2014-35 with the graphics for the EntityDelay object turned off. Normally, EntityDelay shows the various entities moving along a line representing the fraction of time completed for each delay. Unfortunately, this graphic requires the data for each entity to be stored in a HashMap, adding considerable overhead unrelated to scheduling and executing events. A more realistic benchmark is obtained when this overhead is turned off.

A practical example of JaamSim's speed advantage was revealed a few weeks ago when we delivered a very complex supply chain model to a mining industry client for their internal use. The model was prepared in our TLS software -- an add-on to JaamSim. The TLS model required only 7 minutes to complete a simulation run, while the less-detailed Arena model it replaced required 35 minutes. Our client was very pleased by this unexpected boost to his productivity.

Friday, August 22, 2014

What is in a name?

Selecting a name for a new software product such as JaamSim can be a vexing process. Names are important -- a rose by any other name might smell as sweet -- but it would not sell very well if was hard to pronounce or evoked something negative.

The Igor Naming Guide provided some excellent advice, but it was hard to put it into practice. We thought of all sorts of clever, evocative names only to find that they had been used twice over or that someone was already sitting on the .com domain name. Even the names we had rejected as being too dumb or cute had been used already.

According to Igor, an acronym is one of the worst possible ways to name something. However, we were getting desperate and the name JamSim -- "Java Animation, Modeling, and Simulation" seemed promising. That is, it did until a Google search showed that it had already been used for "Java MicroSimulation". The domain name was available, but there were too many references to JamSim for this name to be a viable option. How about "JaamSim" then?

The name JaamSim seemed pretty weak until a search on Google revealed that the "Jaam-e Jam" or Cup of Jamshid is a magical wine bowl from Persian mythology. By looking into the bowl, one could see people and events taking place in other locations: 

"The whole world was said to be reflected in it, and divinations within the cup were said to reveal deep truths."

This was great! I had been interested in Persian culture ever since reading The Rubáiyát of Omar Khayyám at an impressionable age. The idea that our software might "reveal deep truths" had me hooked, so the name "JaamSim" won out in the end.

Now that you know the story of its name, perhaps JaamSim will help you to reveal your own deep truths.

Monday, July 21, 2014

Benchmarking Part 3 - Executing Model Blocks


My first post on the topic of benchmarking discrete event simulation software identified three processes that could potentially bottleneck a typical simulation model:
  1. Event scheduling and processing
  2. Creating and destroying entities
  3. Executing model block that do not involve simulated time
This post deals with the last item on the list - the time required to execute a model block that does not involve simulated time. This benchmark may seem a bit vague since there are many type of model blocks that do not advance simulated time. Its intent is to capture the overhead associated with moving an entity from one block to the next in a process flow type simulation model.

In a perfect world, we would benchmark a wide variety of blocks for each simulation software package. No doubt, the efficiency of each software package will vary with the type of block. Software A might be much more efficient the software B for one block, but much less efficient for another block. To get started, we chose to benchmark the blocks that seize and release a resource. These blocks are commonly used in simulation models and are implemented in one form or another in every simulation software package. Very little computation is required to seize or release a block - only statistics collection - so we expect that it provides an approximate measure of the overhead time to move an entity from one block to another. 

Model Block Execution Benchmark

The model used to benchmark the execution of model blocks that do not advance simulated time is shown in the following figure.

In this model, two entities are created at time zero and directed to the Seize block. The first entity seizes the resource and executes a one second delay. The second entity enters the Seize block's queue to wait for the resource. On completing the one second delay, the first entity releases the resource and is returned to the Seize block. This process continues endlessly, with one entity completing the delay during each second of simulated time. Two entities were used in the model to ensure that the Seize block always had an entity to process, avoiding a potential source of inefficiency for some software packages.

As with the previous benchmarks, the average time to seize and release a resource was measured by running the model for 60 seconds of real time (using a stopwatch) and counting the number of times the Release block was executed. The effect of computer speed was allowed for by converting the calculated time into clock cycles. All measurements were made using my laptop computer which has a second generation (Sandybridge) Core i5 processor running at 2.5 GHz.

Performance Results

The results of the benchmark for Arena, Simio and JaamSim are shown in the following bar chart. 

The time to execute a model block was calculated by taking the average execution time per entity for the benchmark, subtracting the time to execute the delay, and dividing by two. The time for the delay was taken from the first benchmark for each software package. It was necessary to divide by two since two blocks were executed for each trip through the benchmark - a seize block and a release block.

Only the result for JaamSim2014-20 is shown - there was no difference between the values for the three versions shown in the previous posts.

The benchmark results show that JaamSim requires very little time to process simple blocks such as Seize and Release.

Concluding Remarks

This post concludes the series of three on the topic of benchmarking. Thanks to the hard work carried out by Harvey Harrison and Matt Chudleigh, JaamSim is now significantly faster.

Saturday, July 19, 2014

Benchmarking Part 2 - Creating and Destroying Entities


My previous post identified three processes that can bottleneck a typical simulation model:
  1. Event scheduling and processing.
  2. Creating and destroying entities.
  3. Executing model blocks that do not involve simulated time.
Each of these activities has the potential to be slow enough to bottleneck a model's execution speed. By measuring the processing times for these activities, we can assess the relative strengths and weaknesses for each simulation software package.

Event scheduling and processing was investigated in my previous post. I now move on to the second item on the list: creating and destroying entities.

Entity Creation and Destruction Benchmark

The simulation model used to benchmark entity creation and destruction is simplicity itself. An EntityGenerator is used to create a series of entities that are sent immediately to an EntitySink block for destruction.  The following figure shows the JaamSim model for this benchmark.

Benchmark Model
The average time to create and destroy an entity was measured by running the model for 60 seconds of real time (using a stopwatch) and counting the number of entities that were destroyed. The effect of the computer speed was allowed for by converting the calculated time into clock cycles. All measurements were made using my laptop computer which has a second generation (Sandybridge) Core i5 processor running at 2.5 GHz.

Performance Results

The following bar chart shows the results of the benchmarking exercise for Arena, Simio, and for the same three versions of JaamSim that were benchmarked in the previous post.

The first thing to note about these results is that event creation and destruction is a very time consuming process! Even for the fastest version of JaamSim, the entity creation/destruction time is about fifteen times longer than the time to execute a delay. This activity can easily be the bottleneck for a simulation model that processes large numbers of entities.

The version of JaamSim initially tested, JaamSim2014-11, was quite slow due to the slow method for event scheduling and processing used in this version (one future event is needed to generate an entity at a future time). Once a faster method was used in JaamSim2014-16, its time was only slightly slower than Arena and more than twice as fast as Simio. For the final version tested, JaamSim2014-20, some simple optimizations were performed that reduced the processing time by a factor of three to "only" 23,000 clock cycles.

Concluding Remarks

Although JaamSim is now significantly faster, there is still room for further improvement. The time to create and destroy an entity is about fifteen times longer than the time to execute a time delay.

The slowness of event creation/destruction is significant only when very large numbers of entities are to be created and destroyed. On the laptop computer used for these benchmarks, JaamSim can create and destroy about 110,000 entities per second. At this rate, only 9 seconds of processing time are required for a million entities - an insignificant amount. However, one hundred million would require 15 minutes and one billion would require 2.5 hours. Simulations involving such large number of entities will require more efficient entity creation and destruction methods or a model design that pools and recycles the entities instead of destroying them.

Monday, May 19, 2014

Benchmarking Discrete Event Simulation Software


Over the last several months Harvey Harrison and Matt Chudleigh have been working hard to increase JaamSim's execution speed. With the release this week of JaamSim2014-20, I am pleased to say that JaamSim is now faster than both Arena and Simio.

Before starting our optimisation process, we searched the internet for any information on benchmarks for discrete event simulation software. Surprisingly, there are no published benchmarks of any kind that compare the execution speeds of the various simulation packages. Some articles compare the features and design of the software packages, but provide no quantitative data. To the best of my knowledge, this blog post is the first to give numerical results.

The first step in any optimisation effort is to decide what to measure. For discrete event software, there are three activities that are likely to be slow enough to bottleneck execution speed:
  1. Event scheduling and processing.
  2. Creating and destroying entities.
  3. Executing blocks that do not involve simulated time.

Event Scheduling and Processing Benchmark

This post focuses on the first of these activities -- scheduling and executing events. The following model was devised to measure this time.
Benchmark Model
In this model, N entities are created at time zero and fed into a delay activity whose duration is selected randomly from the range 0 to 2N seconds (uniformly distributed). On completion of the delay, each entity is routed back to re-execute the delay again. This design results in an average of one entity per second completing the delay and an average of N events on the future event list. The use of the uniform distribution ensures that each new event is inserted in a random position in the future event list.

The average time to create, schedule, and execute an event was measured by running the model for 60 seconds of real time (using a stopwatch) and counting the number of times the delay activity was completed. The effect of the computer speed was allowed for by converting the calculated time into clock cycles. All measurements were made using my laptop computer which has a second generator (Sandybridge) Core i5 processor running at 2.5 GHz.

Performance Results

We began by benchmarking the initial version of JaamSim (2014-11) and comparing the results to those for Arena and Simio. The following graph shows the execution time per event for the three software packages over a range of future event list sizes. This is the "Before" picture for the simulation weight loss clinic.
Before Optimisation
Clearly, JaamSim2014-11 was quite a lot slower than both Arena and Simio for event scheduling and processing. This slowness was caused by its use of system threads to control simulated time. Each event required two context switches -- passing control from one thread to another -- a relatively time consuming process for the computer.

Next we began optimising. The following graph shows the performance of two subsequent releases of JaamSim.
After Optimisation
The biggest gain came from re-designing the event processing engine so that events could executed without context switching. This was no small task and preparations had been made for this change over several previous months. Jaam2014-16 was the first release to implement the new system and the event processing time immediately dropped from 67,000 clock cycles per event to 1,200 clock cycles per event (with an empty future events list). For comparison, Simio required 1,800 clock cycles per event.

The next step was to improve performance when there are a large number of events on the future events list. JaamSim2014-16 used a simple array structure that required O(N) clock cycles to insert a new event. This worked well up to about 100 events on the future event list, but became very inefficient for larger numbers of events. JaamSim2014-18 (not shown) improved this design enough to extend the range of efficient operation up to 1,000 future events, but became too slow at 10,000 events.

Finally, in JaamSim2014-20, we replaced the future events array with a new data structure called a "red-black tree" that requires only O(logN) clock cycles to insert a new event. With this change, JaamSim provides excellent performance to at least 10,000 future events.

Concluding Remarks

As of JaamSim2014-20, we have improved event scheduling and processing to the extent that it is no longer a significant factor in model execution speed. As subsequent posts will show, the execution time for a JaamSim delay block is now about the same as that for any of the other blocks in the Basic Objects palette.

So far, we have only benchmarked Arena and Simio, but more will come over the next 6 months. In particular, I'd like to see how well AnyLogic, FlexSim, and ExtendSim perform. Assistance from any readers who are familiar with these software packages would be most welcome. The hardest part of benchmarking other software is to learn enough to implement the various benchmarks in way that gets the best from the software. In the case of Simio, I am grateful to Alan Sagan for providing an implementation that performed ten times faster than my first attempt.

Tuesday, May 13, 2014

The Birth of JaamSim

JaamSim has its origin in the simulation models we prepare for the oil & gas and mining industries. These models simulate the movement of products such as crude oil, LNG, coal, and iron ore from production through marine terminals and ships to their final destinations. You can view some videos of these models using the link from the JaamSim webpage or at our YouTube channel: JavaSimulation.

Long before JaamSim, we wrote our models in commercial off-the-shelf software.  In the early 1980's, we used GPSS, followed a few years later by SLAM. In the 1990's, we used Audition, a simulation package developed in part by the National Research Council of Canada. When Audition came to the end of its life a decade later -- it was incompatible with Windows 2000 and NT -- we had several large models that needed to be converted to a new simulation platform. What to do?

We started by reviewing the lessons we had learned from our previous simulation packages:
  • Most of our modelling effort goes into writing custom simulation objects. With SLAM, we had a small framework that was built from its standard components and a very large body of FORTRAN code that did the real work.  SLAM did little more that manage the simulation clock and execute events.
  • A non-standard programming language is a bad idea. Audition was quite an advanced programming language in its day -- basically Smalltalk with extensions to do discrete-event simulation. As good as it was, it had a small user base and lacked many basic features. We spent a huge amount of effort writing code to do things that would already be provided in a standard programming language.
  • Simulation packages tend to become obsolete after ten or so years. Newer, more capable software appears on the scene and the old software becomes a legacy product that receives only token development. This is a healthy process for the simulation industry, but it leaves the model developer with a lot of software to rewrite.
As a result of this experience, we decided to adopt a mainstream object-oriented programming language and build our own simulation engine. We chose Java over C++ because it is simpler language and has a more productive programming environment. This proved to be a good decision. In eight months, we had constructed our simulation engine, written a compiler to translate Audition code into Java, and had converted and verified all our models.

Our new simulation program, the "Transportation Logistics Simulator" (TLS), was still an application-specific model for product transportation. TLS evolved with each new project and over the years acquired all the features of a modern simulation package.

JaamSim was born in 2010 when we separated TLS into a general purpose simulator and a set of application-specific components. In 2011, we released JaamSim as free, open source software and in 2013 introduced it to the simulation community with a Winter Simulation Conference paper.

JaamSim now consists of over 50,000 lines of executable code and Ohloh estimates that it represents about 13 years of effort, or about one million dollars of programmer time. It is one of the most active open-source projects that they track.


Wednesday, May 7, 2014

Why is JaamSim free?

The first question most people ask me about JaamSim is "Why are you doing this?" This is a good question. We are going to a lot of trouble to develop and promote software that generates no revenue for us.

There is a personal reason and a business reason for JaamSim.  I'll start with the personal one. I've been a simulation consultant for more than thirty years now, and I would like to make a lasting contribution to the field. Over the years, I've developed strong opinions on how simulation models should be constructed and on the role simulation technology should play in the world. I'll say more about these topics in future posts, but suffice it to say that the present generation of commercial off-the-shelf simulation software is not suitable for this vision and that JaamSim is designed to fill this gap. JaamSim embodies all our ideas on how simulation software should work.

The business reason for JaamSim is simple. We want to be recognized as a leader in simulation technology to help us sell our consulting services. It's no good for us to claim that our software is the most sophisticated because everyone else does this too. Clients have no way to make a technical evaluation and simply assume that the most popular software is the best and safest choice. The only way for us to prove the value of our software is for it to be widely used by other simulation professionals. We have made JaamSim free and open source so that more people will try it out and hopefully adopt it as their preferred software. At the end of the day, our reputation is more important to us than any money that JaamSim might have generated as a commercial offering.