
HEATS OF FORMATION OF SOLIDS WITH ERROR . . . PHYSICAL REVIEW B 91, 235201 (2015)
FIG. 2. (Color online) (a) Shows the probability distribution of
the calculated error (HmBEEF − HExpt) in the heat of formation
divided by the estimated error (σBEE) from the ensemble of functionals.
(b) Shows the probability distribution of the calculated
error (HFERE
mBEEF
− HExpt) in the heat of formation divided by
the estimated error (σFERE
BEE ) from the ensemble of functionals after
correcting the reference phase energies. The ensemble energies have
also been recalculated employing the fitting eventually giving the
new error estimates σFERE
BEE . The green plots in (a) and (b) show the
Gaussian distributions with zero mean and unit standard deviation.
running average calculated as 24
P
1
2
xi + xi+J
≈ J
N(xi+J − xi )
, (3)
with xi being the ratio between actual error and predicted
error, and the parameter J = 20. For a perfect statistical error
prediction one could expect that the distribution would be
Gaussian with a width of 1, which is also shown in the figure
for comparison. The large peak in the histogram around zero
shows that there is some tendency for the error prediction to
be on the large side, but the overall agreement is quite good.
If the FERE fitting procedure is applied to the mBEEF
ensemble the ratios of real to predicted errors result in the
histogram shown in Fig. 2(b). Both the real (HFERE
mBEEF
−
HExpt) and the predicted errors (σFERE
BEE ) are now smaller but
the relative distribution remains fairly close to a Gaussian
of unit width. However, now a tail in the histogram appears
indicating that for some systems the predicted error can be 3 or
4 times smaller than the actual error. This is a fairly common
feature of the ensemble approach 26.
E. Cross validation
In any regression process it is necessary to validate the
quality of the regression over a set of test data which is
not the part of the training data set. Overfitting, i.e., more
parameters in the model than required to model the data, will
lead to poor prediction of the test data set. One of the most
important features that a fitting scheme should possess is the
predictability on a completely new data set. One might expect
good predictions on a data set which is similar in nature to
the training data set. For example, in our case, we expect a
good predictability for the binary compounds since we use
only binary compounds in the training data set. The fitting
procedure provides corrections for the reference energies of the
elements which are independent of the chemical environments
of the atoms. Therefore, we can expect that if the environments
change considerably, which can for example be the case for
ternary or quarternary compounds, the improvement will be
less pronounced.
Hence, in the testwe not only include the binary compounds
but the ternary compounds as well. We compose a set of 24
binary and ternary compounds where the experimental heats
of formation are available and which are not present in the
training data. We summarize the results in Table IV. As for
the training set the MAE and σ in general show a significant
decrease with the fitted reference energies indicating that we
do not overfit. However, the improvement is somewhat less
than for the training set which is also what could be expected.
Also for the test set we see that the three functionals PBE,
RPBE, and PBE+U reach the same level of accuray after
fitting although PBE+U is considerably better before fitting.
The performance of the TPSS functional does not seem to be
any better than any of the GGA functionals. In fact the rms
error for TPSS is only slightly reduced after fitting, while the
MAE is reduced more. This behavior can be traced to a single
system (Cs2S), which is clearly poorly corrected by the fitting
scheme. We have not been able to identify why this is the
case. It can be noted that Cs was not included in the database
considered by Stevanovi´c et al. 4.
The most interesting feature is that the mBEEF functional
already before fitting is of the same quality as the other
functionals after fitting. Furthermore, the improvement of the
mBEEF results using the fitting is only moderate. This means
that moving to mBEEF the fitting procedure can be completely
avoided at only a moderate cost in computational time (less
than a factor to 2) compared to the GGAs.
In compounds such as SrSe and Mn2SiO4 the predictions
with mBEEF remain the same after the fitting procedure;
however, the estimated error is significantly reduced leading to
large real error relative to the predicted uncertainty. It should
be noted that it is an inherent limitation in the ensemble
error estimation that fluctuations in the predictions can only
result from fluctuations within the defined model space (i.e.,
meta-GGA in this case). If errors appear which cannot be
described by such fluctuations an underestimation of the error
may result.
235201-7