Efficiency Considerations

Gaussian has been designed to work efficiently given a variety of computer configurations. In general, the program attempts to select the most efficient algorithm given the memory and disk constraints imposed upon it. Since Gaussian does offer a wide choice of algorithms, an understanding of the possibilities and tradeoffs can help you to achieve optimal performance.

Before proceeding, however, let us emphasize two very important points:

Estimating Calculation Memory Requirements

With Gaussian 09 and current computers, the default memory size of 256 MB is sufficient for most job types and methods, employing basis functions through g functions. If your basis set includes h or higher angular momentum functions (e.g. cc-pVQZ), then you may need to increase the amount of memory you give to the job. The following formula can be used to estimate the memory requirement of various types of Gaussian jobs (in 8-byte words):

    M + 2(NB)2

where NB is the number of basis functions used in the calculation, and M is a minimum value that is usually generously covered by the default memory size. Note that 1 MW = 1,048,576 words (= 8,388,608 bytes). Larger amounts of memory may be required for derivatives of contracted high angular momentum functions (f and above).

The remainder of this section is designed for users who wish to understand more about the tradeoffs inherent in the various choices in order to obtain optimal performance for an individual job, not just good overall performance. Techniques for both very large and small jobs will be covered. Additional related information may be found in reference [Schlegel91].

Memory Requirements for Parallel Calculations

When using multiple processors with shared memory, a good estimate of the memory required is the amount of memory required for a single processor job. For distributed memory calculations (i.e., those performed via Linda), the amount of memory specified in %Mem should be equal to or greater than the value for a single processor job.

In Gaussian 09, these two parallelization methods can be combined. For example, you might use the following directive in order to run a job on 8 CPUs located on 4 two-headed shared memory multiprocessors:

%Mem=128MW                           Memory required by each multiprocessor.
%LindaWorkers=sysa,sysb,sysc,sysd    Specify the four Linda workers: all multiprocessor system.
%NProcShared=2                       Use two shared memory processors on each multiprocessor computer.

SCF Procedure

In order to speed up direct HF and DFT calculations, the iterations are done in two phases:

This approach is substantially faster than using full integral accuracy throughout without slowing convergence in the great majority of cases tested so far. In the event of difficulties, full accuracy of the integrals throughout can be requested using SCF=NoVarAcc. See the discussion of the SCF keyword for more details.

Problem Convergence Cases

The default SCF algorithm uses a combination of two Direct Inversion in the Iterative Subspace (DIIS) extrapolation methods: EDIIS and CDIIS. EDIIS [Kudin02] uses energies for extrapolation, and it dominates the early iterations of the SCF convergence process. CDIIS, which performs extrapolation based on the commutators of the Fock and density matrices, handles the latter phases of SCF convergence. This algorithm is very reliable, and previously troublesome SCF convergence cases now almost always converge with the default algorithm. For the few remaining pathological convergence cases, Gaussian 09 offers Fermi broadening and damping in combination with CDIIS (including automatic level shifting).

These are the available alternatives if the default approach fails to converge (labeled by their corresponding keyword):

SCF=Fermi
Requests temperature broadening during early iterations [Rabuck99], combined with CDIIS and dynamic damping of early SCF iterations. This is the first choice when there are SCF convergence problems.

SCF=QC
This is quadratically convergent SCF, based on the method of Bacskay [Bacskay81]. Since it combines linear minimizations with the Newton-Raphson algorithm suggested by Bacskay, it is guaranteed to reach a stationary point eventually. Typically, SCF=QC is about twice as expensive as conventional SCF. Since SCF=QC is reliable and can be used for direct SCF, it is usually the second choice if convergence problems are encountered. It can be used for RHF and UHF, but not for complex or ROHF.

Guess=Alter
Sometimes convergence difficulties are a warning that the initial guess has occupied the wrong orbitals. The guess should be examined, especially as to the symmetries of the occupied orbitals. Guess=Alter can be used to modify the orbitals selected for occupation.

SCF(MaxCyc=N)
Increases the total number of SCF iterations to N. Note that merely increasing the number of SCF cycles for the default algorithm is rarely helpful.

These approaches all tend to force convergence to the closest stationary point in the orbital space, which may not be a minimum with respect to orbital rotations. A stability calculation can be used to verify that a proper SCF solution has been obtained (see the Stable keyword). Note also that you should verify that the final wavefunction corresponds to the desired electronic state, especially when using Guess=Alter.

The freqmem utility program returns the optimal memory size for different parameters of frequency calculation (i.e., the amount required to perform the major steps in a single pass).

MP2 Energies, Gradient and Frequencies

Four algorithms are available for MP2, but most of the decision-making is done automatically by the program. The critical element of this decision making is the value of MaxDisk, which should be set according to your particular system configuration (see chapter 3). It indicates the maximum amount of disk space available in words. If no value is specified for MaxDisk, either in the route section or in the Default.Route file, Gaussian will assume that enough disk is available to perform the calculation with no redundant work, which may not be the case for larger runs. Thus, specifying the amount of available memory and disk is by far the most important way of optimizing performance for MP2 calculations. Doing so allows the program to decide between the various available algorithms, selecting the optimal one for your particular system configuration. This is best accomplished with -M- directive and MaxDisk keyword in the Default.Route file (although MaxDisk and %Mem may be included in the input file).

Higher Correlated Methods

The correlation methods beyond MP2 (MP3, MP4, CCSD, CISD, QCISD, etc.) all require that some transformed (MO) integrals be stored on disk and thus (unlike MP2 energies and gradients) have disk space requirements that rise quartically with the size of the molecule. There are, however, several alternatives as to how the transformed integrals are generated, how many are stored, and how the remaining terms are computed.

The default in Gaussian is a semi-direct algorithm. The AO integrals may be written out for use in the SCF phase of the calculation or the SCF may be done directly or in-core. The transformation recomputes the AO integrals as needed and leaves only the minimum number of MO integrals on disk (see below). The remaining terms are computed by recomputing AO integrals.

The following points summarize the effect of MaxDisk for post-SCF methods:

CIS and TD Energies and Gradients

In addition to integral storage selection, the judicious use of the restart facilities can improve the economy of CIS and TD calculations.

Integral Storage

Excited states using CI with single excitations can be done using five methods (labeled by their corresponding option to the CIS keyword). Note that only the first two options are available for the TD method:

Direct
Solve for the specified number of states using iterative diagonalization, forming the product vectors from two-electron integrals computed as needed. This algorithm reduces memory and disk requirements to O(N2). This is the default for TD.

InCore
Requests that the AO Raffenetti combinations be held in memory. In-core is quite efficient, but is only practical for small molecular systems or large memory computers as N4/4 words of memory are required. This approach is used automatically if there is sufficient memory available.

MO
Solve for the specified number of states using iterative (Davidson) diagonalization, forming the product vectors using MO integrals. This is the fastest method and is the default. This algorithm is an efficient choice up to about 150 basis functions, depending on the number of occupied orbitals. The more occupied orbitals, the sooner the direct algorithm should be used. Since only integrals involving two virtuals are needed (even for gradients) an attempt is made to obey MaxDisk. The minimum disk required is about 4O2N2 (6O2N2 for open-shell). This is the default for CIS.

Restarting Jobs and Reuse of Wavefunctions

CIS and TD jobs can be restarted from a Gaussian checkpoint file. This is of limited use for smaller calculations, which may be performed in the MO basis, as new integrals and transformation must be done, but is invaluable for direct CIS. If a direct CIS job is aborted during the CIS phase, then SCF=Restart should be specified in addition to CIS=Restart or TD=Restart, as the final SCF wavefunction is not moved to its permanent location (suitable for Guess=Read) until the entire job step (or optimization step) completes.

CIS and TD Excited State Densities

If only density analysis is desired, and the excited states have already been found, the CIS density can be recovered from the checkpoint file, using Density=(Check,Current) Guess=Only, which recovers whatever generalized density was stored for the current method (presumably CIS) and repeats the population analysis. Note that the one-particle (unrelaxed) density as well as the generalized (relaxed) density can be examined, but that dipole moments and other properties at the CIS level are known to be much less accurate if the one-particle density is used (i.e., if the orbital relaxation terms are neglected) [Foresman92, Wiberg92]. Consequently, the use of the CIS one-particle density is strongly discouraged, except for comparison with the correct density and with other programs that cannot compute the generalized density.

Separate calculations are required to produce the generalized density for several states, since a CPHF calculation must be performed for each state. To do this, first solve for all the states and the density for the first excited state:

# CIS=(Root=1,NStates=N) Density=Current

if N states are of interest. Then do N-1 additional runs, using a route section of the form:

CIS=(Read,Root=M,NStates=N) Density=Current

for states M=2 through N.

Pitfalls for Open-Shell Excited States

Since the UHF reference state is not an eigenfunction of S2, neither are the excited states produced by CIS or TD [Foresman93].

Stability Calculations

Tests of Triplet and Singlet instabilities of RHF and UHF and restricted and unrestricted DFT wavefunctions can be requested using the Stable keyword. Stability calculations can be restarted as described above for CIS.

CASSCF Efficiency

The primary challenge in using the CASSCF method is selecting appropriate active space orbitals. There are several possible tactics:

In all cases, a single-point calculation should be performed before any optimization, so that the converged active space can be checked to ensure that the desired electrons have been correlated before proceeding. There are additional considerations in solving for CASSCF wavefunctions for excited states (see the discussion of the CASSCF keyword for details).

CASSCF Frequencies

CASSCF frequencies require large amounts of memory. Increasing the amount of available memory will always improve performance for CASSCF frequency jobs (the same is not true of frequency calculations performed with other methods). These calculations also require O2N2 disk space.


Last updated on: 9 Jun 2009