Transient analysis has been cited as an algorithm not suitable for parallelization. This is due to the analysis being inherently sequential; previous results are required before a new calculation can be performed. However, there is one point where parallelization can logically be employed; the load operation.
In dc and transient analysis, the load operation is performed before each iteration. During the load operation, the device-specific code is run using the previous iteration results, and the circuit matrix and right-hand side (rhs) vectors are loaded with the computed values. As the ordering of devices is unimportant, different threads can be called upon to perform the load operation for different devices, in parallel. The matrix and rhs loading operations are engineered to be atomic, so that different threads do not interfere with one another while accessing these common resources.
The loading operation dominates the simulation time in many circuits, particularly when complex device models such as BSIM are used. These circuits benefit most from multi-threaded loading.
When enabled, multi-threaded loading is used in dc analysis, including operating point analysis and when finding the operating point ahead of ac small-signal analysis, and transient analysis. By default, multi-threaded loading is disabled. It is enabled by setting the loadthrds variable to an integer value 1 or larger. This can be done in a .options line in the SPICE deck, or interactively from the command line using the set command, or graphically from the General page of the Simulation Options panel from the Tools menu.
The loadthrds variable sets the number of helper threads that will be created to assist the main thread in evaluating device code. If 0 or not set, no helper threads are used.
Multiple threads will not necessarily make simulations run faster and in fact can have the opposite effect. The latter is sadly true in Josephson circuits tested thus far. The problem is that multi-threading adds a small amount of overhead, and the load function may be called hundreds of thousands of times in these simulations. The model calculation for JJs runs very quickly, and the overhead becomes significant. The same is true for other simple devices. Work to improve this situation is ongoing.
On the other hand, if there is a lot of computation in the device model, this will dominate the overhead and we see shorter load times. This is true for BSIM MOS models, in circuits with more than about 20 transistors. Such simulations can run 2-3 times faster than a single thread. One should experiment with the value of the loadthrds variable. Most likely for best performance, the value plus the main thread should equal the number of available hardware threads, which is usually twice the number of available CPU cores.