Atomistic Computer Simulations

Self-interaction correction in DFT

Fri, 12 Aug 2016 00:00:00 +0000

One of the biggest problems facing DFT is that of self-interaction: each electron effectively interacts with itself, because the potential derives from the total charge density of the system. This is not an issue for the exact (unknown) density functional, or for Hartree-Fock, but is the cause of significant error in many DFT functionals. Approaches such as DFT+U[1],[2],[3] and hybrid functionals (far too many to reference !) are aimed in part at fixing this problem.

Probably the earliest attempt to remove this error is the self-interaction correction of Perdew and Zunger[4] which corrects the potential for each Kohn-Sham orbital, complicating the calculation considerably over a standard DFT calculation. (Ironically, this paper, which has over 11,000 citations, is best known for its appendix C, where a parameterisation of the LDA XC energy is given.) However, this process is notoriously slow to converge and is not widely used.

A recent paper[5] showed that, even for isolated molecules, complex orbitals were required to achieve convergence, and this approach has now been tested for atomisation energies of a standard set of 140 molecules[6]. The tests compare the new complex SIC implementation against the standard, real implementation, as well as various GGAs, hybrid functionals and meta-GGAs. The complex SIC, when coupled with the PBEsol functional[7], gives good results (though ironically the PBEsol functional was developed to improve PBE for solids). Not surprisingly, the best results are from hybrids, but meta-GGA improves the energies almost as well.

This study highlights the problem with DFT at the moment: there are many different approaches, which often work well for specific problems. SIC is cheaper than hybrid calculations, and can be important for charge transfer problems (and Rydberg states). The results for convergence and complex orbitals are interesting, but based on these results, I would use meta-GGA for atomisation energies, as a good compromise between accuracy and cost (almost the same as GGA).

[1] Phys. Rev. B 52, R5467 (1995) DOI:10.1103/PhysRevB.52.R5467

[2] Phys. Rev. B 57, 1505 (1998) DOI:10.1103/PhysRevB.57.1505

[3] Int. J. Quantum Chemistry 114, 14 (2014) DOI:10.1002/qua.24521

[4] Phys. Rev. B. 23, 5048 (1981) DOI:10.1103/PhysRevB.23.5048

[5] J. Chem. Theory Comput. 12, 3195 (2016) DOI:10.1021/acs.jctc.6b00347

[6] J. Chem. Theory Comput. in press (2016) DOI:10.1021/acs.jctc.6b00622

[7] Phys. Rev. Lett. 100, 136406 (2008) DOI:10.1103/PhysRevLett.100.136406

Different approaches to creating (meta-GGA) DFT functionals

Tue, 31 May 2016 00:00:00 +0000

Within the DFT community, John Perdew’s idea of the Jacob’s ladder of accuracy[1] starts with LDA, moves to GGAs with the inclusion of the gradient of the electron density, and is further extended (to rungs three and four in the ladder analogy) with meta-GGA, where kinetic energy density is added, and hybrids, where exact exchange plays a role. Although meta-GGAs have been around for ten to fifteen years, they are only starting to become widely used. I will compare two examples which both seem promising, but also encapsulate the two most common approaches to functional creation.

Meta-GGAs are promising because the kinetic energy density allows some discrimination between areas with one or two electrons (in this way, they are similar in some ways to the electron localisation function, or ELF, which can be used to analyse bonding[2]). This gives some hope that they may be able to fit both strong and weak bonding as well as possibly mitigating the self-interaction error that plagues DFT.

The Minnesota meta-GGA MN15-L[3] takes a very complex functional form with 58 parameters and a non-separable form for the exchange-correlation (adding an extra correlation functional to this) and fits it to a portion of a very large database (I estimate that there are at least 900 entries in the database, covering many different chemical properties, specifically including solid state properties, transition barriers and weak interactions). The resulting functional is local (no hybrid or non-local van der Waals terms were included) and produces extremely small errors when compared to those parts of the database which were not used in fitting. Notably, it out-performs functionals with exchange and van der Waals included.

By contrast, the SCAN functional[4] uses only seven parameters (close to, if not at, the minimal number for a meta-GGA) and includes various physically motivated norms and constraints for the electron gas. The early progress in GGAs was made by satisfying important constraints, so this is seen as a good route to reliability. This functional was also tested on various databases, particularly focussing on solid-state and weak interactions. It has excellent agreement with these, though was published before the MN15-L functional so is not compared directly. In a follow-up paper[5] the performance of the SCAN functional for band gaps is found to be good, though it does not calculate Kohn-Sham gaps, but gaps within a generalized Kohn-Sham theory (a distinction which I don’t have time to discuss here; I may write another blog on this, as it is relevant to hybrid functionals among other things.)

What can we learn from this ? Both functionals perform well in the tests which are published. The functional that you choose will depend in part, as always, on which system you wish to study; however, both of these functionals show some promise in being widely applicable. Your choice will also depend on your attitude to fitting[6]: is a reasonable functional form with many parameters something that you trust, or do you prefer to be more prescriptive, and deal with fewer parameters ? Fifty years since its inception, DFT is still developing, communities are still somewhat divided in the approaches that they take to functional development, but there are an increasing number of ways to achieve efficiency and accuracy.

[1] My colleague, Mike Gillan, reckons that we should instead talk about wrestling Jacob when considering how to improve DFT functionals.

[2] J. Chem. Phys. 92, 5397 (1990) DOI: 10.1063/1.458517

[3] J. Chem. Theor. Comput. 12, 1280 (2016) DOI: 10.1021/acs.jctc.5b01082

[4] Phys. Rev. Lett. 115, 036402 (2015) DOI: 10.1103/PhysRevLett.115.036402

[5] Phys. Rev. B 93, 205205 (2016) DOI: 10.1103/PhysRevB.93.205205

[6] The oft-quoted maxim about fitting an elephant with four parameters has been put into practice (see the original paper here and a nice write-up and a python implementation here)

Testing the reproducibility of DFT calculations

Tue, 12 Apr 2016 00:00:00 +0000

A paper in Science (or equivalent journal) generally reports novel or ground-breaking research. At first sight, the paper I’ll discuss in this post[1] does not fit into that category: it reports an extensive set of tests on calculations for the equation of state (EOS) for 71 elemental solids using a variety of DFT codes, all using the PBE functional.

This paper is the product of a collaboration (you can find all the data, test suites etc on their web site[2]) that has been going for a while, and is both important and impressive. They have defined a single parameter, delta, which allows them to compare EOS calculated with different codes, giving a simple route to evaluating the reproducibility of DFT. This is immensely valuable, because different codes use different basis sets, different numerical solvers and different approaches to the external potential (full potential or a variety of pseudopotentials), and as a result will give different answers for the same simulation. The question is: how different are the answers ?

The key result from this paper is that modern DFT codes now achieve a precision[3] which is better than experimental; in terms of the paper, this means a delta value which is better than 1 meV/atom. This precision applies across various basis sets: plane waves, augmented plane waves, and numerical orbitals. It also applies to all-electron, PAW, and both ultra-soft and norm-conserving pseudopotential calculations. The summary table from the paper is reproduced below; the numbers given are the RMS value for delta across all 71 elements, while the colour indicates overall reliability.

Why is this work significant ? First, it gives a way to test new DFT codes and implementations, basis sets and approaches to the potential. So we now have an absolute reference against which codes can be compared. Second, it shows that there are now freely-available pseudopotential libraries which are precise in comparison to all-electron results (this is something that wasn’t true even five years ago - their Table 2, which shows the changing precision of different libraries over time, is fascinating). For both users and code developers, this is great news: there is no longer any question as to whether a particular pseudopotential is reliable, certainly within the context of single elements.

What could be added to the study ? Here are some ideas:

More extensive tests. There are no tests of elements in different environments - and this can pose extreme challenges to pseudopotentials (think of the different oxidation states of transition metals, for instance).
A comparison between the codes (e.g. speed, memory or parallelisation).
This would be very challenging, but would be interesting data.
More functionals and extensions of DFT will be important to include.

This paper is an immensely valuable contribution to the electronic structure community, as well as the wider scientific community, and it is good to see it published in a high-profile journal.

[1] DOI:10.1126/science.aad3000

[2] Delta value website from centre for molecular modelling

[3] Precision indicates the spread between different measured values, while accuracy indicates the deviation from the correct result (however “correct” is defined !)

An efficient approach to ab initio thermodynamics

Tue, 22 Dec 2015 00:00:00 +0000

Ab initio thermodynamics is both extremely challenging and extremely important. The challenge arises from the need to sample an energy distribution sufficiently well to converge calculations; the importance comes from the insight that we can gain into experimentally inaccessible situations (I have several colleagues who work on iron in the Earth’s core which is not readily accessible experimentally). A new paper[1] suggests an approach to ab initio thermodynamics that will be extremely helpful for certain calculations (and potentially useful for general calculations). I have written about calculations on liquid iron in Section 4.6 of the book, and on general approaches to thermodynamics in Chapter 6.

When finding average values of variables at finite temperature, we have to sample over a set of micro-states which are distributed according to a potential energy, $U_1(\mathbf{r})$, with a Boltzmann factor that depends on the potential giving the probability of each state. The standard approach to this is to use either MD or Monte Carlo (MC) to sample the potential energy surface, possibly using a weighting scheme to speed up convergence. This tends to be quite expensive when using ab initio methods where a long MD run may be required.

The key insight of the new method is that we can perform the same averaging using a set of micro-states that are distributed according to a different potential energy, $U_0(\mathbf{r})$, with the Boltzmann factor now accounting for the distribution of each state relative to the new potential, $U_1(\mathbf{r}) - U_0(\mathbf{r})$. If the new potential is significantly cheaper than the first, then we can perform a long sampling run using this potential, and draw the micro-states from this distribution, reducing significantly the number of expensive calculations that need to be performed.

This paper presents a careful analysis of the effect of the accuracy of the cheap method (here taken to be a classical potential, ideally fitted to some ab initio MD) and its effect on the sampling. While the method is efficient for standard averages, it is outstanding for thermodynamic integration, where it can reduce the number of simulations by an order of magnitude or more. It is clear that it’s been developed in this context - where the absolute free energy is required. In the context of ab initio thermodynamics, this is a significant step forward.

[1] Comp. Phys. Commun. 127, 1 (2015) DOI:10.1016/j.cpc.2015.07.008

Exploring energy landscapes

Fri, 27 Nov 2015 00:00:00 +0000

Once you have mastered the basics of performing DFT calculations (or other atomistic methods for finding the energy of a structure), you need to move on to understanding the system that you are investigating. Largely speaking, this will involve exploring its energy landscape: the local minima define its thermodynamic properties, while the barriers between the minima define the kinetic behaviour. (The topics in this blog are covered in parts of Chapters 5, 12, 15 and 18 in the book.)

This distinction is very important, and it’s also important to realise that many experiments are not actually operating in the thermodynamic minimum. My favourite example taken from my PhD research relates to the growth of germanium on silicon: when depositing it, a fascinating series of reconstructions is seen, which arise from the strain mismatch between the two materials. But a thin layer of Ge on Si is not the lowest energy structure, as we found when we left a sample annealing over a few days: the lowest energy structure is a dilute alloy of Ge dispersed through Si, which gives a rather featureless surface and little of interest to the surface scientist.

In general, the problem of exploring an energy landscape is extremely difficult, as shown by the wealth of approaches that have been developed (I covered some of these in a blog on high throughput methods). We cannot simply perform a simple molecular dynamics simulation to explore the landscape, because of the timescales involved: MD covers at most microseconds. We can simply prepare simulations in structures close to obvious, intuitive minima (often guided by experiment), though this risks missing structures. Various algorithms for searching for stable structures exist: random structure searching and genetic algorithms both seek to find only the minima, and are often effective (though are limited in system size because of the complexity of the problem). I’ve written about how we benchmark search techniques before.

There are approaches that accelerate MD including the hyperdynamics though these are not guaranteed to explore the entire energy surface. Methods that explore both transition states as well as minima (such as metadynamics and the dimer method) are growing in both popularity and predictive power, but can be rather expensive computationally.

If you have two minima, there are well-established ways to find the transition state between the two. The simplest approach is simply to decide on some variable that will define a path, and constrain that variable at various points between the start and the end. This is best suited to simple problems such as the diffusion of one atom, and risks missing important behaviour if the energy landscape is at all complicated.

The nudged elastic band method is one of the most widely used approaches to transition state searching. The idea is elegant: we construct a series of replicas of the system, interpolated between the start and end points, and join the images of each atom with springs (or elastic bands). We then minimise the energy of the entire, composite system; the springs stop the atoms that are moving from falling back to the start of the end, and we map out the transition state, at least approximately.

Of course it is not that simple: there is a tendency for images to slide down energy hillsides, so nudging is introduced: the force perpendicular to the local tangent to the pathway is projected out. We can also invert the component of the force along the pathway for the image with the highest energy, which will force it to climb up to the top of the barrier.

The question of how many images to use is very difficult: the computational effort increases with images, but so does the accuracy of the description of the surface. I was involved in some work[1] which showed that one image can give a good estimate of the barrier height, but that accuracy only improves for a significant number of images (between 5 and 8, depending on the system).

There is no simple prescription for this problem: it is one of the most complex problems within physics, chemistry and materials science. The areas where it is applied vary between the structure of nanoparticle, surface reconstructions and protein folding. You must be careful to be honest about how you have explored the system, and where you might have made errors or missed structures. This is an area where fruitful collaboration with experiment can be a great help, giving data to test against possible structures.

[1] J. Phys.:Condens. Matter 22, 074203 (2010) DOI:10.1088/0953-8984/22/7/074203

How do we establish the accuracy of a method ? Full CI quantum Monte Carlo

Wed, 25 Nov 2015 00:00:00 +0000

One of the topics that is often discussed within the electronic structure community is that of accuracy: how accurate is a given method. While DFT is efficient and widely applicable, it has many known limitations, and rarely comes close to what is called chemical accuracy (1 kcal/mol or around 40meV). Recent years have seen various efforts to improve the accuracy of DFT (I have blogged about this before: here, here and here for instance), but while these additions have had some success, they are necessarily limited, and there is no systematic way to improve accuracy in DFT. There is, therefore, a need for well-defined benchmarks against which DFT and other methods can be tested. Experiment often forms one important touchstone, but we need to be confident that the calculation we perform corresponds to the experimental set-up (often a difficult problem). In this blog I will discuss a recently developed approach, full CI quantum Monte Carlo[1], that allows convergence to the exact, many-body wavefunction result (for a given basis set). This gives both an important way to test other methods, and a powerful method for studying problems that need this level of accuracy.

Going beyond standard DFT accuracy normally involves adding extra terms (such as a fraction of exact exchange in hybrid functionals), introducing new functionality (for instance via TDDFT) or using DFT wavefunctions as the input to perturbative expansions (such as GW). One advantage DFT holds over these other methods is in the size of system that can be modelled: it generally scales with the cube of the number of atoms, and can address systems with hundreds or thousands of atoms (with linear scaling DFT we can go to millions of atoms[2] or beyond). More accurate methods generally scale more strongly with atom number and are limited in the size of the system that they can address.

Quantum chemistry techniques differ fundamentally from DFT-based techniques in that they work with approximations to the many-body wavefunction rather than the charge density[3]. A systematic approach to improving accuracy can be defined within this formalism (which I should note is extremely sophisticated and requires more space than I can give here). The starting point is Hartree-Fock theory, which approximates the many-body wavefunction with a Slater determinant of molecular orbitals built from some basis set (almost inevitably Gaussian functions these days). The simplest improvements to Hartree-Fock invoke standard quantum mechanical perturbation theory (such as the MP2 method), but while these methods are powerful and reasonably accurate, they are limited. The configuration interaction (CI) method goes beyond the single determinant of Hartree-Fock, and adds determinants which include all possible excitations of one, two, three (or more) electrons. The full CI solution is prohibitively expensive beyond about ten electrons, though this limit also depends on the completeness of the basis set used.

The quantum Monte Carlo (QMC) family of methods provide an alternative, very accurate approach, and seek to calculate the ground state many-body wavefunction using a stochastic approach (see [4] for a recent overview of these methods, or a review like [5] for more details). However, I want to write about a recent development: full configuration interaction QMC, or FCI-QMC[1].

FCI-QMC works within the space of Slater determinants which are possible, given the system and the basis set chosen. Rather than adjust the coefficients of the determinants, it evolves a stochastic set of walkers, with different populations on different determinants, through the operations of spawning (creating a new walker on a new determinant), cloning (an existing walker on the same determinant) and annihilation (removing pairs of walker with opposite signs on the same determinant—this is needed for proper fermionic behaviour). While the number of walkers required is rather large, the computational and memory is very small for each, and it can be shown that this procedure converges to the exact, full configuration interaction result. The only error left is that of the basis set (this does not affect other QMC methods, which work in real space). There are well-established methods for extrapolating to a complete basis set limit.

In the paper which prompted me to write this post, the FCI-QMC method was applied to various solid state problems[6]. The paper is remarkable for several reasons, not the least of which is managing to get a purely computational paper, which largely presents benchmark calculations, into Nature.

The authors point out that full CI calculations form an important reference point for other quantum chemistry methods in molecular calculations, as they give the exact result for a given basis. However, these results are not available in the solid state. Recent work has seen a number of developments that allow quantum chemistry approaches to be applied to solid state problems (whether using traditional, gaussian basis sets, or plane wave basis sets).

A solid-state implementation of FCI-QMC is not trivial: the method scales essentially exponentially with k points, though this can be mitigated somewhat by ensuring that sampling between k points obeys momentum conservation. For small cells (LiH), even with modest k point sampling, there are approximately $10^{30}$ determinants to sample. They demonstrate converged calculations on a wide variety of materials, and show that the most accurate of the widely used approximations (CCSD(T), or coupled cluster with singles, doubles and some triples) gives excellent results compared to the exact result.

The method is not cheap: for diamond carbon, with four k points in each direction, the calculation took 25,000 CPU hours, with a relatively modest basis set. However, it does allow us to test the well-established hierarchy of quantum chemical methods in solids, and demonstrate that the best of these go beyond chemical accuracy in solids (even for strongly correlated materials).

Why could this comparison not be made with existing QMC methods, such as diffusion Monte Carlo ? The difference in basis sets is the key issue: DMC works in real space, and the quantum chemistry calculations would have had to be converged to the complete basis set limit to enable comparisons. FCI-QMC works in the same space as the quantum chemistry calculations, and thus gives the exact result for any given choice of system and basis set.

There have been a number of other developments in FCI-QMC, many related to improving the efficiency of the method, but also recently showing that it is possible to sample excited states efficiently[7]. As with all QMC methods, the FCI-QMC method parallelises with very high efficiency (in all of these methods a given walker can operate almost independently), but it is not possible at the moment to evaluate forces accurately. They have a very specific domain of applicability, but within that domain, they are quite possibly the most accurate methods available.

[1] J. Chem. Phys. 131 054106 (2009) DOI:10.1063/1.3193710

[2] J. Phys.: Condens. Matter 22, 074207 (2010) DOI:10.1088/0953-8984/22/7/074207

[3] Indeed, these different approaches are often referred to, respectively, as wavefunction methods and density methods.

[4] J. Chem. Phys. 143, 164105 (2015) DOI:10.1063/1.4933112

[5] Rev. Mod. Phys. 143, 164105 (2015) DOI:10.1063/1.4933112

[6] Nature 493, 365 (2013) DOI:10.1038/nature11770

[7] J. Chem. Phys. 143 134117 (2015) DOI:10.1063/1.4932595

Scaling of DFT calculations with system size

Fri, 20 Nov 2015 00:00:00 +0000

As I made clear in my previous tutorial blog, I think that it is important for people using DFT codes to understand some of the internal mechanics. This blog will deal with another technical issue: scaling of the problem with system size.

Why should this matter ? Pragmatically, it is important to know both how long your simulation is likely to take before starting in on it and how large a computational resource you may need. This will also determine whether you can ask certain questions in your simulations: if they will require unreasonable timescales or computer resources[1] then a different study should be designed. There are two key resources: total run time, and memory required. I will discuss run time below; the memory required in general scales with the square of the system size.

The overall scaling of standard DFT codes is often given as $N^{3}$, where $N$ is some measure of system size (whether number of atoms, number of bands or number of basis functions). In plane wave codes, the basis set increases with unit cell volume independently of number of atoms or bands, and this affects the amount of vacuum that is used in surface or molecular studies. However, the simple form of scaling is the only factor: the pre-factor is important, as is the quantity that scales.

Prefactors will determine the system size at which cubic scaling will become dominant: if the cubic scaling operation has a very small cost compared to a quadratic or linear scaling operation, it will only become significant with large system sizes. This is one reason why linear scaling DFT codes are not more widely used: the pre-factor, at the moment, is rather large. The question of what scales contributes to the pre-factor: to stay with the plane wave example, the number of plane waves is much larger than the number of bands, so an operation that scales as $N_{bands} \times N_{PW}$ will be much cheaper than one that scales as $N_{PW} \times N_{PW}$.

The total energy in DFT is often found by adding different contributions, and these scale differently. The Hartree energy, along with the local pseudopotential energy and the exchange-correlation energy, is found as an integral of a potential with a charge density, and scales linearly with the system size. The kinetic energy requires an integral for each band, and so scales as $N_{bands} \times N_{PW}$ (we can substitute the number of points on a real-space grid for the number of plane waves if this is how the integral is performed), though this has a small pre-factor.

The most expensive part of the energy is the non-local pseudopotential energy, which also scales as $N_{bands} \times N_{PW}$, but has a larger pre-factor. It is more efficient to evaluate this energy using non-local projector functions in real-space than in reciprocal space, but it is still a high cost. In plane wave codes, fast Fourier transforms (FFTs) are also expensive: they scale as $N_{bands} \times N_{PW} ln N_{PW}$ when all wavefunctions are transformed. Fortunately, they are highly optimised on modern computers; they do, however, involve communication between all processes on a parallel machine, which limits their scaling with number of processes[2].

The cubic scaling that limits DFT approaches actually comes from the requirement to orthogonalise the eigenstates to each other (in a code which optimises the wavefunctions rather than diagonalising the Hamiltonian—which also scales with the cube of the matrix size). This operation cannot be avoided, but does have a small pre-factor, so only becomes significant at large system sizes.

One factor which actually improves scaling is the Brillouin zone sampling. All of the operations described above have to be performed at each k-point, giving a prefactor of $N_k$ to each cost. As we go to larger system sizes the Brillouin zone sampling required reduces, and the net cost of a simulation scales more slowly than might be expected. However, once truly large systems have been reached, this factor goes to one and cubic scaling dominates.

It is important, therefore, to build up an understanding of how long different calculations will take to run with small simulations, before embarking on a larger simulation. It is also important to realise that parallel scaling is not perfect, and the speed-up gained from increasing the number of processes will be lower than linear (though the memory requirements per process will improve). I should also note that it is possible to achieve linear scaling of computational cost (both in time and memory), and to go to millions of atoms[3].

[1] It is important to remember that time on high-performance computing (HPC) resources is often restricted and awarded through grants, so needs to be used wisely.

[2] Since modern CPUs are multi-core and even run more threads than there are cores, it makes most sense to refer to the number of processes running than the number of CPUs/cores.

[3] J. Phys.: Condens. Matter 22 074207 (2010) DOI:10.1088/0953-8984/22/7/074207

Why you need to understand how minimisers work

Fri, 13 Nov 2015 00:00:00 +0000

Numerical minimisation is at the heart of most electronic structure codes, and are involved in finding the electronic wavefunctions, often the self-consistent charge density, and in relaxing atoms during structural minimisation. Many of these techniques are very sophisticated, and in modern codes they have often been tuned for performance (sometimes heuristically) but there is no guaranteed way to find a global energy minimum, and they will fail. So it is very important to understand how they work, and why they might fail. Monitoring a calculation to ensure convergence is generally worthwhile (though this can easily turn into an unhelpful distraction if taken to extremes).

In general, we assume that the energy can be written in terms of some multi-dimensional vector, $\mathbf{x}$, which might represent expansion coefficients for the basis functions in the wavefunctions, or the atomic coordinates, or other parameters. We then expand the energy to second order in $\mathbf{x}$:

$E(\mathbf{x}) = E_0 - \mathbf{g}\cdot\mathbf{x} + \frac{1}{2}\mathbf{x}^\prime \cdot \mathbf{H} \cdot \mathbf{x}$

where $\mathbf{g} = -\partial E/\partial \mathbf{x}$ and $\mathbf{H} = \partial^2 E/\partial \mathbf{x}^\prime \partial \mathbf{x}$, the Hessian, which gives the curvature of the energy surface.

All methods use an iterative approach: we choose a direction in which to optimise, find a minimum along that direction, and repeat until some convergence criterion is reached.

Steepest Descents

This is the simplest approach to minimisation, which is generally a very poor choice of method. The search direction is always taken as $\mathbf{g}$, the local downhill gradient. While this is simple, its efficiency depends strongly on the starting point, as illustrated below for a simple, two-dimensional problem where we ought to be able to find the ground state in two steps.

There are two reasons that steepest descents performs so badly: first, it takes no account of previous minimisations; second, it does not use any information about the curvature (the Hessian, above). We can see that the Hessian is useful by taking the equation for $E$ above, and seeking the stationary points (i.e. solving for $\partial E/\partial \mathbf{x} = 0$ ):

$\frac{\partial E}{\partial \mathbf{x}} = -\mathbf{g} + \mathbf{H}\cdot \mathbf{x}\\ \frac{\partial E}{\partial \mathbf{x}} = 0 \Rightarrow \mathbf{g} = \mathbf{H} \cdot \mathbf{x} \Rightarrow \mathbf{x} = \mathbf{H}^{-1}\cdot \mathbf{g}$

If we had the full Hessian, then we could find the minimum, but this would be prohibitively expensive. Improved methods approximate the Hessian, as we will see.

Conjugate gradients

If we choose the search direction, $\mathbf{h}_n$ at any given iteration $n$ to be conjugate to the previous search direction, where conjugacy is defined by $\mathbf{h}_m \cdot \mathbf{H} \cdot \mathbf{h}_n = 0$, then the minimisation of the previous step will not be affected by the present step, and the local curvature is accounted for.

The key to the conjugate gradients method is that this condition can be imposed without calculating the Hessian, if we choose:

$\mathbf{h}_{n+1} = \mathbf{g}_{n+1} + \gamma \mathbf{h}_n\\ \gamma = \frac{\mathbf{g}_{n+1}\cdot\mathbf{g}_{n+1}}{\mathbf{g}_n\cdot\mathbf{g}_n}$

The maths leading to this formula is not too complex, and can be found in a variety of places[1]. The formula given here is easy to implement, and ensures that successive search directions are conjugate, while successive local gradients are orthogonal. The conjugate gradients method is widely implemented and generally reliable, though requires a good line minimiser (see below for more on this topic) and can fall prey to ill conditioning (also discussed below).

Quasi-Newton methods

The maths for quasi-Newton methods is a little more complex, so I will not detail it here, but the essence of the approach is simple. It generalises the Newton-Raphson approach to multiple dimensions, and builds up an approximation to the inverse Hessian over the course of the optimisation, using the same basic formula for the optimum value of $\mathbf{x}$ given above. Generally there is a need to truncate the amount of information stored to keep the memory requirements reasonable, but beyond this restriction, the method is very efficient.

Line minimisation

Finding the minimum in a given search direction is a key part of these algorithms. The most robust approach first seeks to bracket the minimum, by taking successively larger steps downhill, and then refines the brackets to find the minimum (using bisection or some more sophisticated approach). The problem with this approach is efficiency: it can require many evaluations, which are computationally wasteful.

A simpler alternative is to take an step with a length that is estimated to be close to the minimum, and use inverse quadratic interpolation to find the minimum from the two points and two gradients available. The approach can work very well when a function is close to quadratic, but often leads to errors in the early stages.

Problems

The most common problem facing numerical optimisation is ill conditioning. If the Hessian has eigenvalues that span a large range, then in some directions the gradient will be very steep, while in others it will be very shallow. This makes it very hard to find the minimum. The ideal solution would be to adjust the curvatures so that they are the same in all directions—which is just the same as inverting the Hessian and applying it to the gradient. Preconditioning approaches estimate an inverse Hessian and use it to improve the convergence of the minimisation. A famous example in electronic structure relates to the kinetic energy of the electrons, and is most easily understood in a plane wave basis set, where the kinetic energy is proportional to $G^2$ for the wave vector $\mathbf{G}$. For kinetic energy of large wavevector components dominates the gradient, giving classic ill conditioning; the solution is to scale these components by $1/G^2$ while leaving the smaller wavevector components unchanged[3]

During structural optimisations, this type of behaviour is often seen when there are very soft modes (where groups of atoms can rotate or rock almost freely). There are other issues: if the electronic minimisation is not fully converged then the structural optimisation can fail (it always pays to check the convergence), and large changes in electronic structure with atomic structure can also give issues (often helped by introducing a larger electronic temperature).

You will inevitably encounter situations where your optimisations or minimisations fail, and understanding how they work can help to diagnose and fix the problem.

[1] While Numerical Recipes is generally the best place to look, I have found the analysis from Jonathan Shewchuk most helpful for this topic[2]

[2] See the first entry here

[3] This can be managed using a factor like $f(G) = G_0^2/(G_0^2 + G^2)$ which is close to 1 if $G<G_0$ but is close to $1/G^2$ for large values of $G$.

An introduction to pseudopotentials

Fri, 06 Nov 2015 00:00:00 +0000

Pseudopotentials are widely used throughout physics (and in some parts of quantum chemistry); in brief, they use the screening effects of the core electrons on the valence electrons to replace the nuclear potential with a softer[1] potential, and to remove the core electrons from the problem that is being solved. For a detailed discussion of pseudopotentials, I recommend Richard Martin’s book[2].

The ideas in pseudopotential theory date back at least fifty years, and come from a solid-state background. The difficulty with choosing a basis set for solid-state problems comes from the two different behaviours of the wavefunctions: near the nuclei, they are strongly bound, rapidly varying and rather close to atomic wavefunctions; in the interstitial space, they are much smoother, and slowly varying.

There are several methods which try to address this problem head-on, by using a basis set with both characters: atomic-like functions near the nuclei, supplemented with plane waves (e.g. the FLAPW method). An early approach, which is helpful in understanding the whole problem, is the orthogonalised plane wave approach (OPW) where the basis functions are written as plane waves, orthogonalised to local, atom-centred functions:

$\phi_q(\mathbf{r}) = e^{i\mathbf{q}\cdot\mathbf{r}} - \sum_j c_j u_j(\mathbf{r})$

The functions $u_j(\mathbf{r})$ are the atom-centred functions, often taken to be the core wavefunctions from the atoms. This formalism links directly to the pseudopotential formalism, and gives a clear idea of the process being undertaken. With a pseudopotential, we only solve the Schrodinger equation for the valence electrons, and we substitute the full potential with a pseudopotential which is shallower and smoother than the full nuclear potential. This potential is normally only applied up to a certain cut-off radius, beyond which the pseudopotential is exactly equal to the full potential.

A good reproduction of the scattering of the nuclear potential often requires that different potentials are found for each angular momentum value. You will find the terms semi-local (where a potential is non-local in angular variables but local in radius, r: $\hat{V} = \sum_{lm} \vert Y_{lm}\rangle V_(r) \langle Y_{lm}\vert$ and non-local (where the pseudopotential is written in a separable form, such as $\sum_j f_j(\mathbf{r}) g_j(\mathbf{r^\prime})$ which is fully non-local in all variables; this becomes an integral operator) though a detailed discussion is beyond the scope of this blog.

Types of pseudopotential

There are three commonly used types: norm-conserving; ultrasoft; and projector-augmented waves (PAWs)[3]. It is important to understand the basic ideas behind each of these types. When considering pseudopotentials, it is important to be aware of the accuracy of the potential and its transferability: that is, how accurate it is in different environments, in particular going from the simple, atomic calculation where it is generated to a more complex environment.

Norm-conserving pseudopotentials require the norm of the pseudofunctions to be the same as the norm of the all-electron wavefunctions:

$\int_0^{r_c} r^2 \vert \psi^{PS}_i(\mathbf{r})\vert^2 dr = \int_0^{r_c} r^2 \vert \psi^{AE}_i(\mathbf{r})\vert^2 dr$

where $r_c$ is the core radius of the pseudopotential. This requirement was found to make more accurate, transferrable pseudopotentials (it can be shown that it enforces a further condition, that the energy derivative of the logarithmic derivative of the pseudofunctions also match those of the all-electron wavefunctions).

These pseudopotentials are still commonly used, but are challenging particularly for first-row elements and transition metals, where there are no core electrons to screen the potential. Accuracy and transferability often lead to small core radii, which in turn makes the potential harder.

Ultrasoft pseudopotentials relax the norm-conservation constraint, and often introduce multiple potentials for each angular momentum channel. This enables a softer potential, and matching the all-electron eigenvalues at more energies, often giving a larger core radius than is possible for a norm-conserving pseudopotential with the same accuracy.

Projector-augmented waves (PAWs) give what is effectively an all-electron method with the cost of a pseudopotential method (though there are some subtleties here). There is a very strong link between PAWs and ultrasoft potentials[4]. The method defines a transformation between the pseudofunctions and the all-electron wavefunctions, that changes the pseudofunctions only within spheres centred on atoms.

Within these spheres, the wavefunctions are expanded as partial waves, with coefficients that are found using projector functions (which are very similar to projector functions found in pseudopotentials). The PAW method allows for the easy reconstruction of all-electron wavefunctions, but does not allow the core wavefunctions to respond during the simulation.

A caution

I want to end with a caution: there are now many libraries of pseudopotentials that are available with electronic structure codes, and there is a strong temptation to simply use the potential without any testing. This is a very dangerous thing to do, and every potential should always be tested carefully before embarking on production calculations. You should be clear about how they were generated (in particular about the core radii) and how they respond in situations of differing density or valence. A pseudopotential is an approximation, though a well understood and well-defined one, and that requires testing and understanding.

[1] Soft when applied to potentials means smooth and shallow, while hard means rapidly varying and generally deep. The number of bound states in the potential will correlate with its depth, and because higher states are orthogonal to lower, they must have more nodes. More nodes in a function generally means a higher second derivative (i.e. kinetic energy) which in turn makes representation either on a real-space grid or with Fourier components (plane waves) more costly.

[2] Electronic Structure, Richard M. Martin, Cambridge 2004. Book website

[3] There are some people who do not class PAWs as pseudopotentials, though this distinction is largely academic.

[4] G. Kresse and D. Joubert, Phys. Rev. B 59, 1758 (1999) DOI:10.1103/PhysRevB.59.1758

A discussion of the background theory to DFT

Fri, 30 Oct 2015 00:00:00 +0000

These blogs are aimed at fourth year undergraduates/Masters students, with a relatively restricted time for projects (six months or so); we do not have time to discuss DFT in full detail, but it is important to cover the background theory sufficiently thoroughly to understand its limitations and capabilities. This week we covered DFT theory in more depth (some of this has been covered in the first blog of this series, here ).

The most important thing to understand about DFT is that it can be shown that the ground state properties of a system depend only on the charge density, and not on the full details of the many-body wavefunction (this is essentially what the Hohenberg-Kohn theorems tell us). The idea dates back to Dirac if not earlier[1]. As a result, we can write the total energy of the system as a functional (a function of a function) of the charge density.

The energy can be helpfully written as the sum of a number of terms: kinetic energy of the electrons; electron-ion energy; electron-electron energy; and the ion-ion energy. It is vital to include this last term, as it determines the stability of our system. We will work with classical ions and within the Born-Oppenheimer approximation (decoupling electron and ionic degrees of freedom: these are not always good approximations but simplify the problem considerably). In this context, the ion-ion interaction is simply the classical electrostatics of a set of point charges[2].

The kinetic energy can, in principle, be written in terms of the charge density, and indeed this was the approach of Thomas and Fermi (the Thomas-Fermi functional is still used, and shows that the kinetic energy can be found as $E_{KE} \propto \int n^{5/3}(\mathbf{r}) d\mathbf{r}$, with $n(\mathbf{r})$ the electron density ). But the accuracy of this approach is poor, and almost all DFT uses the Kohn-Sham approach, which writes the kinetic energy in terms of a set of non-interacting electrons:

$E_{KE} = -\frac{\hbar^2}{2m} \sum_n \langle \psi^{KS}_{n} \vert \nabla^2 \vert \psi^{KS}_{n}\rangle$

The density is then written as $n(\mathbf{r}) = \sum_{n} \vert \psi^{KS}_{n}\vert^2 $. It is important to note that this kinetic energy is not the same as the kinetic energy for the many-body wavefunction (while the operator is the same, the wavefunction is different) and this difference is included in another term.

The electron-ion interaction involves a sum over the potentials from each ion, using either the bare Coulomb potential or a pseudopotential (which I will discuss in my next blog). While the details of pseudopotential generation and implementation are complex and important, the potential and its associated energy are quite simple: $V(\mathbf{r}) = \sum_I V_I(\mathbf{r})$, $E_{eI} = \int d\mathbf{r} V(\mathbf{r}) n(\mathbf{r}) $.

The electron-electron interaction is split into two terms: the classical electrostatic energy of a charge density, often called the Hartree energy, and the exchange-correlation energy. The Hartree energy is written:

$E_{Har} = \frac{1}{2}\int\int \frac{n(\mathbf{r})n(\mathbf{r}^\prime)}{\vert \mathbf{r} - \mathbf{r}^\prime\vert} d\mathbf{r} d\mathbf{r}^\prime$

This is normally found, along with the potential, using fast Fourier transforms (FFTs) on an even grid throughout real-space. It conceals one of the nastier errors in DFT: the self-interaction error. The energy for each electron contains the effect of that electron interacting with itself - which is quite wrong. It gives a tendency for DFT to delocalise charge more than it should be[3].

The final term in the energy is the exchange and correlation energy, which accounts for all the many-body terms that have been left out so far; it can be shown that this reformulation is, in principle exact - the problem is that we do not know the correct form of the exchange-correlation functional.
There is a vast number of XC functionals available, each of which have both advantages and disadvantages. I will briefly list the hierarchy now.

Local density approximation (LDA). This was the approximation originally proposed (and comes in part from classical DFT, which is used to model liquid flow), and assumes that the energy can be written as the integral over all space of the charge density multiplied by the exchange-correlation energy of a uniform electron gas with the density at that point in space. This energy can be found from quantum Monte Carlo calculations (among others). LDA tends to over-bind, giving lattice constants that are too small, and binding energies that are too large. It is remarkably successful, mainly because it is not as crude as it seems (see a recent review[4] for some details, especially part IV C, and the literature for full details).
Generalised gradient approximations (GGA). These extend the LDA, and consider functionals that depend both on the charge density and its gradient. They often give better binding energies and overall results, but the wealth of different functionals should give a clue to the fact that there is no single perfect functional. Commonly used functionals include PBE (and a version which was reparameterised for solid-state, PBEsol) and BLYP.
Meta-GGAs add the kinetic energy density of the electron gas as a further variable. They offer improvements in some areas, but as far as I am aware, there is no concensus on whether they are always better.
Hybrid functionals. Here, some Hartree-Fock exchange (calculated using the Kohn-Sham eigenstates) is mixed into the functional. Hybrids often improve on band gaps (which are notoriously poor in LDA and GGA) and reaction barriers, but are computationally expensive (particularly for plane wave implementations). It should be noted that the fraction of exchange to be mixed in is not defined a priori, and forms a parameter. PBE0 and B3LYP are probably the two most widely used hybrid functionals. In solid state codes, screened hybrids (where exact exchange is only used for short-range exchange and standard LDA/GGA exchange is used at long ranges) are common and improve the efficiency, though again the range for screening must be fitted somehow.

It is very important to understand what functional you are using, and how it is limited, rather than simply using what seems best or easiest. DFT is a very powerful method, but has significant limitations. A significant part of any paper or thesis should be to consider what errors the choice of functional and/or parameters could make in the results.

[1] He makes this point in his excellent book on quantum mechanics as well as other places.

[2] For a periodic system, this is not a trivial problem to solve, as the electrostatic interaction is long-ranged. The standard approach is to use the Ewald method, which splits the problem into short-range (solved in real space) and long-range (solved in reciprocal space).

[3] In Hartree-Fock theory, this self-interaction is exactly cancelled by an equivalent term in the exchange energy; however, DFT writes the exchange in terms of the density and so does not cancel the term.

[4] Rev. Mod. Phys. 87, 897 (2015) DOI:10.1103/RevModPhys.87.897