11.10.2 Parallel execution in Abaqus/Standard

Products: Abaqus/Standard Abaqus/CAE

References

Overview

Parallel execution in Abaqus/Standard:

reduces run time for large analyses;
is available for shared memory computers for the Lanczos eigensolver; and
is available for shared memory computers and computer clusters for the element operations, direct sparse solver, and iterative linear equation solver.

Parallel equation solution with the direct sparse solver

The direct sparse solver supports both shared memory computers and computer clusters for parallelization. On shared memory computers, thread-based parallelization is used for the direct sparse solver; and on computer clusters, a hybrid MPI and thread-based parallelization is used. The direct sparse solver cannot be used on computer clusters if:

the analysis also uses the iterative linear equation solver or the Lanczos eigensolver, or
the analysis requires features for which MPI-based parallel execution of element operations is not supported.

In addition, the direct sparse solver cannot be used on computer clusters for analyses that include any of the following:

multiple load cases with changing boundary conditions (“Multiple load case analysis,” Section 6.1.3), and
the quasi-Newton nonlinear solution technique (“Convergence criteria for nonlinear problems,” Section 7.2.3).

To execute the parallel direct sparse solver on computer clusters, the environment variable mp_host_list must be set to a list of host machines (see “Using the Abaqus environment settings,” Section 3.3.1). MPI-based parallelization is used between the machines in the host list. If more than one processor is available on a machine in the host list, thread-based parallelization is used within that host machine. For example, if the environment file has the following:

cpus=8
mp_host_list=[['maple',4],['pine',4]]

Abaqus/Standard will use four processors on each host through thread-based parallelization. A total of two MPI processes (equal to the number of hosts) will be run across the host machines so that all eight processors are used by the parallel direct sparse solver.

Input File Usage:

Enter the following input on the command line:

abaqus job=job-name cpus=n

For example, the following input will run the job “beam” on two processors:

abaqus job=beam cpus=2

Abaqus/CAE Usage:

Job module: job editor: Parallelization: toggle on Use multiple processors, and specify the number of processors, n

Memory requirements for the parallel direct sparse solver

The parallel direct sparse solver processes multiple fronts in parallel in addition to parallelizing the solution of individual fronts. Therefore, the direct parallel solver requires more memory than the serial solver. The memory requirements are not predictable exactly in advance since it is not determined a priori which fronts will actually be processed simultaneously.

Parallel eigenvalue extraction with the Lanczos eigensolver

The Lanczos eigensolver uses thread-based parallelization; therefore, parallel execution of the Lanczos eigensolver is available only on shared memory computers. The number of solver threads is equal to the number of processors used for the analysis. Parallel execution of element operations is not supported with the Lanczos eigensolver.

Input File Usage:

Enter the following input on the command line:

abaqus job=job-name cpus=n

For example, the following input will run the job “beam” on two processors:

abaqus job=beam cpus=2

Abaqus/CAE Usage:

Job module: job editor: Parallelization: toggle on Use multiple processors, and specify the number of processors, n

Parallel equation solution with the iterative solver

Parallelization of the domain decomposition-based (DDM) iterative linear equation solver is achieved by mapping groups of domains to individual processors. To activate the parallel iterative solver, specify the number of CPUs for the job. Both MPI and thread-based parallelization modes are supported with the iterative solver. Parallel execution of element operations is supported only when the MPI-based parallel implementation of the iterative solver is used.

Input File Usage:

Enter the following input on the command line:

abaqus job=job-name cpus=n

For example, the following input will run the job “beam” on two processors with the domain-level parallelization method:

abaqus job=beam cpus=2

In this case half of the iterative solver domains will be mapped to each of the processors.

Abaqus/CAE Usage:

Job module: job editor: Parallelization: toggle on Use multiple processors, and specify the number of processors, n

Parallel execution of the element operations in Abaqus/Standard

Parallel execution of the element operations is the default on all supported platforms. The command line and environment variable standard_parallel can be used to control the parallel execution of the element operations (see “Using the Abaqus environment settings,” Section 3.3.1, and “Execution procedure for Abaqus/Standard and Abaqus/Explicit,” Section 3.2.2). If parallel execution of the element operations is used, the solvers also run in parallel automatically. For analysis using the direct sparse solver, thread-based parallelization of the element operations is used on shared memory computers and a hybrid MPI and thread parallel scheme is used on computer clusters. For analyses using the DDM iterative solver, only MPI-based parallelization of element operations is supported.

When MPI-based parallelization of element operations is used, element sets are created for each domain and can be inspected in Abaqus/CAE. The sets are named STD_PARTITION_n, where n is the domain number.

Parallel execution of the element operations (thread or MPI-based parallelization) is not supported for analyses that include any of the following procedures:

eigenvalue buckling prediction (“Eigenvalue buckling prediction,” Section 6.2.3),
natural frequency extraction (“Natural frequency extraction,” Section 6.3.5),
complex eigenvalue extraction (“Complex eigenvalue extraction,” Section 6.3.6),
mode-based linear dynamics (“Transient modal dynamic analysis,” Section 6.3.7; “Random response analysis,” Section 6.3.11; “Response spectrum analysis,” Section 6.3.10; “Subspace-based steady-state dynamic analysis,” Section 6.3.9; and “Mode-based steady-state dynamic analysis,” Section 6.3.8).

Parallel execution of element operations is available only through MPI-based parallelization for analyses that include any of the following:

steady-state transport (“Steady-state transport analysis,” Section 6.4.1),
implicit dynamic (“Implicit dynamic analysis using direct integration,” Section 6.3.2),
static linear perturbation (“General and linear perturbation procedures,” Section 6.1.2),
direct-solution steady-state dynamics (“Direct-solution steady-state dynamic analysis,” Section 6.3.4),
coupled temperature-displacement (“Fully coupled thermal-stress analysis,” Section 6.5.4),
crack propagation analysis (“Crack propagation analysis,” Section 11.4.3), and
contact iterations (“Contact iterations,” Section 7.1.2).

Analyses using the direct sparse solver and any of the procedures above that support only MPI-based parallelization of element operations can be run on computer clusters. However, only one processor per compute node is used for the element operations since thread-based parallelization is not supported.

Finally, parallel execution of the element operations is not supported for analyses that include any of the following:

adaptive meshing (“Defining ALE adaptive mesh domains in Abaqus/Standard,” Section 12.2.6),
co-simulation (“Co-simulation: overview,” Section 14.1.1),
element matrix output requests (“Element matrix output in Abaqus/Standard” in “Output,” Section 4.1.1),
import (“Transferring results between Abaqus analyses: overview,” Section 9.2.1),
matrices (“Defining matrices,” Section 2.10.1),
pressure penetration loading (“Pressure penetration loading,” Section 32.1.7),
substructures (“Substructuring,” Section 10.1),
alternative solution techniques except for the quasi-Newton method (“Approximate implementation” in “Fully coupled thermal-stress analysis,” Section 6.5.4; “Approximate implementation” in “Coupled thermal-electrical analysis,” Section 6.6.2; “Contact iterations,” Section 7.1.2; and “Specifying the separated method” in “Convergence criteria for nonlinear problems,” Section 7.2.3), and
finite-sliding contact in conjunction with the MPI-based iterative solver.

Input File Usage:	Enter the following input on the command line:
	abaqus job=job-name cpus=n

Abaqus/CAE Usage:

Parallel execution of the element operations is not supported in Abaqus/CAE.

Memory management with parallel execution of the element operations

When running parallel execution of the element operations in Abaqus/Standard, specifying the upper limit of the memory that can be used (see “Abaqus/Standard analysis” in “Managing memory and disk use in Abaqus,” Section 3.4.1) specifies the maximum amount of memory that can be allocated by each process.

Transverse shear stress output for stacked continuum shells

The output variables CTSHR13 and CTSHR23 are currently not available when running parallel execution of the element operations in Abaqus/Standard. See “Continuum shell element library,” Section 25.6.8.

Consistency of results

Some physical systems (systems that, for example, undergo buckling, material failure, or delamination) can be highly sensitive to small perturbations. For example, it is well known that the experimentally measured buckling loads and final configurations of a set of seemingly identical cylindrical shells can show significant scatter due to small differences in boundary conditions, loads, initial geometries, etc. When simulating such systems, the physical sensitivities seen in an experiment can be manifested as sensitivities to small numerical differences caused by finite precision effects. Finite precision effects can lead to small numerical differences when running jobs on different numbers of processors. Therefore, when simulating physically sensitive systems, you may see differences in the numerical results (reflecting the differences seen in experiments) between jobs run on different numbers of processors. To obtain consistent simulation results from run to run, the number of processors should be constant.