Designing the best molecular building blocks for battery components is like trying to create a recipe for a new kind of cake, when you have billions of potential ingredients. The challenge involves determining which ingredients work best together — or, more simply, produce an edible (or, in the case of batteries, a safe) product. But even with state-of-the-art supercomputers, scientists cannot precisely model the chemical characteristics of every molecule that could prove to be the basis of a next-generation battery material.
Instead, researchers at the U.S. Department of Energy’s (DOE) Argonne National Laboratory have turned to the power of machine learning and artificial intelligence to dramatically accelerate the process of battery discovery.
As described in two new papers, Argonne researchers first created a highly accurate database of roughly 133,000 small organic molecules that could form the basis of battery electrolytes. To do so, they used a computationally intensive model called G4MP2. This collection of molecules, however, represented only a small subset of 166 billion larger molecules that scientists wanted to probe for electrolyte candidates.
Because using G4MP2 to resolve each of the 166 billion molecules would have required an impossible amount of computing time and power, the research team used a machine learning algorithm to relate the precisely known structures from the smaller data set to much more coarsely modeled structures from the larger data set.
“When it comes to determining how these molecules work, there are big tradeoffs between accuracy and the time it takes to compute a result,” said Ian Foster, Argonne Data Science and Learning division director and author of one of the papers. “We believe that machine learning represents a way to get a molecular picture that is nearly as precise at a fraction of the computational cost.”
To provide a basis for the machine learning model, Foster and his colleagues used a less computationally taxing modeling framework based on density functional theory, a quantum mechanical modeling framework used to calculate electronic structure in large systems. Density functional theory provides a good approximation of molecular properties, but is less accurate than G4MP2.
Refining the algorithm to better ascertain information about the broader class of organic molecules involved comparing the atomic positions of the molecules computed with the highly accurate G4MP2 versus those analyzed using only density functional theory. By using G4MP2 as a gold standard, the researchers could train the density functional theory model to incorporate a correction factor, improving its accuracy while keeping computational costs down.
“The machine learning algorithm gives us a way to look at the relationship between the atoms in a large molecule and their neighbors, to see how they bond and interact, and look for similarities between those molecules and others we know quite well,” said Argonne computational scientist Logan Ward, an author of one of the studies. “This will help us to make predictions about the energies of these larger molecules or the differences between the low- and high-accuracy calculations.”
“This whole project is designed to give us the biggest picture possible of battery electrolyte candidates,” added Argonne chemist Rajeev Assary, an author of both studies. “If we are going to use a molecule for energy storage applications, we need to know properties like its stability, and we can use this machine learning to predict properties of bigger molecules more accurately.”
A paper describing the formation of the G4MP2-based dataset, “Accurate quantum chemical energies for 133,000 organic molecules,” appeared in the June 27 online issue of Chemical Science.
A second paper describing the machine learning algorithm, “Machine learning prediction of accurate atomization energies of organic molecules from low-fidelity quantum chemical calculations,” appeared in the August 27 issue of MRS Communications.