# Electrolyte Design Lab

## Introduction

• "Electrolyte Design Lab"
One of the components of lithium ion battery(LIB) is the electrolyte composed of solvents and lithium salts. Lithium ions move through the electrolyte and the charges are transported from one electrode to the other electrode. The electrolyte needs to have several properties such as the electrochemical stability, thermal stability, an appropriate potential window, and so on. In “Electrolyte Design Lab:, we provide tools for calculating basic properties that are needed in designing new electrolytes for LIBs. The Hartree-Fock (HF) theory is used as the calculation level used in the “Electrolyte Design Lab”. The available properties are as following:
• HOMO
• LUMO
• Viscosity
• Flash Point

• Modules in "Electrolyte Design Lab"
• Lobby : Here, you can see the calculation results.
• Working Studio : You can input chemical structures by loading, drawing, and enumerating chemical structures and calculate the selected properties.
• Job Control : All tasks submitted by the user can be monitored.

## User Manual : Lobby

When you log in the "Electrolyte Design Lab", you are in the "Lobby". If you want to go back to the "Lobby" from the "Working Studio", click the "Lobby" button () in the menu bar.
You can find the calculation results at the "Lobby".
When you click the result, you can view the molecular structure on the left side.

• Lobby of "Electrolyte Design Lab"
• Lobby of "Electrolyte Design Lab"
• ## User Manual : Working Studio

When you login the "Electrolyte Design Lab”, you are in “Lobby”. In the menu bar, click the “Working Studio” button () to enter the “Working Studio”. In the “Working Studio”, you can calculate the several properties of electrolytes.
The “Working Studio” is constituted of Draw Explorer, Analysis Explorer, and Job Infomation Explorer.

• Working Studio of "Electrolyte Design Lab"
• ### "Draw" Explorer

You can generated molecular structures which you want to calculate by loading files or drawing structures.
* : Load molecular structure files from the local folder.
* : Draw molecular structures using a drawing tool “Sketcher”..

*  :

### "Job Information" Explorer

Describe the “Job” you intend to submit.
Job Name : Input the task title.
Description : Input a brief description about the job

## User Manual : Job Control

When you click the “Job Control” button () at the top right of the window, the “Job Control” window will pop-up. You can monitor the present status of your job.

• Job Status of "Electrolyte Design Lab"

• ## Tutorials

### Draw a molecular structure using Sketcher and submit the Task

 Object: Draw an electrolyte molecule using Sketcher provided in the “Electrolyte Design Lab” and calculate the properties of the electrolyte molecule.

1. Click the Working Studio button on the top left of the window.
2. Enter the following information into the Information Explorer.
Job Name : EC
3. Click Draw button in the "Draw Explorer".
4. Draw EC structure using “Draw Tool”(Movie )
EC Structure :
1. Click the black dot of “Benzene ” button and select Cyclopentane.
2. Click once on the work space, then you can see a cyclopentane drawn in the work space.
3. Click the black dot of “Single Bond ” button and select double bond, then the button icon is changed to [[File:WS-draw-doubleBond.png|link=|30px].
4. Make a double bond at one site of the cyclopentane by clicking the site.
5. Click “O” on the right side bar menu to change the element from carbon to oxygen, and click the “C” of the “CH2” group of the structure.
6. Click “OK ”button to send the drawn structure to the Working Studio.
7. Click "Close " button to come back in the Working Studio.
5. The transferred structure is shown in the Working Studio like this.
6. Click "Submit ” button to run the task.
7. When you want to view the present status of you job, click the “Job Status” button on the menu bar.

## Technical Information

### Quantum Calculation Methods

We provide two levels of theory for calculation of HOMO, LUMO, Ionization potential, and Electron affinity of electrolyte molecules. One is the semiempirical(SEM) method and the other is the density functional theory (DFT). Although the accuracy is not very reliable, the SEM method is very often used for fast calculation and screening. On the other hand, the DFT is more accurate but the calculation cost is much higher. If the more sophisticated values are wanted, the DFT should be selected. However, for the both levels, the calculation results can be altered by the choices of the detailed conditions. Which Hamiltonian is used in the SEM calculation is critical to the properties. For the successful calculation at the DFT levels, the more complicated options are carefully selected. We have tested the calculation results for the both levels, and we set up the detailed conditions under which we can get the most correlated results to the higher level calculation results.

#### Selected electrolyte structures and Post HF level calculations

As a reference for the investigation of the optimum condition of SEM Hamiltonian and DFT level, we selected 20 well kown electrolyte structures and three levels of DFT calculations was reviewed from the literature. Commercially available and used electrolytes are chosen as the input structure of reference calculation and DFT/SEM calculation in the electrolyte design lab. The properties such as HOMO, LUMO, Vertical IP, Vertical EA, Adiabatic IP, and Adiabatic EA are calculated from the post HF level condition and each configuration of options in DFT and SEM. And then we compared one by one to find the best configuration of the condition of DFT and SEM which shows the least error with the reference properties.

• Electrolyte structures
O. Borodin, et al. J. Phys. Chem. C 117 (2013) 8661 / K. Xu Chem. Rev. 114 (2014) 11503.
• Post HF level calculations
Three DFT calculation conditions are preferable to the prediction of electrolyte property, comparable to the high level post HF calculation like G4MP2. Borodin and coworkers used M05-2x/cc-pvTz and LC-wPBE/6-31+G(d,p) for the prediction of oxidation energy and free energy. Han and coworkers used B3PW91/6-311G(d,p) for the descriptor calculation to find the correlation of the structure and electrolyte properties. (Y.-K. Han, et al. Curr. Appl. Phys. 14 (2014) 897)
-M05-2X / cc-pvTz
-LC-wPBE / 6-31+G(d,p)
-B3PW91 / 6-311G(d,p)
All geometries including the neutral and charged state, were fully optimized and the minimum energy structure was confirmed by the vibrational frequency analysis.

### Quantitative Structure-Property Relationship Models

The quantitative structure-property relationship (QSPR) is a tool for building a model to make predictions of properties of new materials. The first step of the QSPR is data collection. To build models with high reliability, it should be confirmed that the data have been produced as homogeneously as possible; the homogeneity of the data means the experimental conditions for the data acquisition are controlled consistently with a same recipe.

In the QSPR, the prediction will be made based on the chemical structure. To make relationships between the collected data and the corresponding chemical structure, the information contained in the chemical structure should be transformed into the numerical values. Such numerical representation of the chemical structure is called "descriptor". Using a set of descriptors $\lbrace D_1, D_2, \cdots, D_n \rbrace$, a prediction model function is built like;

$property = f(D_1, D_2, \cdots, D_n).$

Usually, the descriptors are categorized into spatial, electronic, topological, thermodynamic, and atomistic descriptors. We calculated all the types of descriptors and selected the most relevant descriptors using the genetic function approximation (GFA).

We implement machine learning to build a function of prediction; (1) artificial neural network and (2) random forest. Computed material dataset in our database is used as training sets and selected descriptors.

#### Artificial Neural Network based QSPR

Among machine learning techniques, we used artificial neural network (ANN) implemented in the R package.

##### ANN-QSPR for Prediction of Viscosity

The transport property of electrolyte is important for the charge/discharge rate performance of Li ion battery. The viscosity of electrolyte solvents can make effects on the transport phenomena such as lithium ion conductivity, and wettability on the electrodes and separator. Therefore, the viscosity is one of the important properties of the electrolytes.

• Data collection
The experimental viscosity data of 440 various organic compounds were taken from literature,[1] and additional 15 organic compounds used as the solvent for lithium ion batteries are collected. The molecular set is structurally sufficient diverse and composed of molecules containing C, H, O, N, S, and all halogens.
With the 455 compounds, the multiple data at a few temperatures were collected, and the total 1239 data points were used in the QSPR. As done in Ref.[1], we split the data into two groups; one is the training set with 1160 data points and the other is the test set with 135 data points.
• Descriptors
Using GFA, we selected the following descriptors.
• Temperature
• Rotatable bonds (Fast descriptor): the number of rotatable bonds
• Chi(1) (valence modified) (Fast descriptor): Kier and Hall velence-modified order one connectivity index. This descriptor contains structural information by encoding the number of edges (bonds) in the molecular graph.[2]
• E-state keys (sums):S_sOH (Fast descriptor): Electrotopological state keys for single-bonded OH [3]
• Hydrogen bond donor (Fast descriptor): the number of hydrogen-bond donors
• Solvation Energy (Molecular descriptor, solvent = acetone): Solvation energy of the molecule in acetone

From the above graph, we can notice that the Chi(1) and Rotatable bonds are the most frequently used descriptors through the generations. It means that the topology and flexibility of molecules are important factors determining the viscosity.

• Prediction model
We built a prediction model using artificial neural net machine learning technique. As mentioned above, we trained the model with 1160 data points. The neural net structure is 6-7-1 with 7 nodes in one hidden layer. The following is the training results of the neural net model.

For the training set, the R2 of the prediction model is 0.9563. The residuals are well distributed above and below 0.0.

For the test set, the R2 of the prediction model is 0.8571.
##### ANN-QSPR for Prediction of Lithium Cation Basicity (LCB)

The Lithium Cation Basicity (LCB) in the gas phase is defined as the negative of the Gibbs free energy of the Li+ complex formation reaction. The LCB is used to describe affinity of a molecule towards Li+ in the gas phase, so this property is useful for design of the additives.
B(g) + Li+(g) → B-Li+(g)

• Data collection
The experimental LCB data of 229 various organic compounds were taken from literature.[4] The molecular set is structurally sufficient diverse and composed of molecules containing C, H, O, N, S, P, and all halogens.
The total 229 data points were used in QSPR. We split the data into two groups; one is the training set with 178 data points and the other is the test set with 51 data points.
• Descriptors
Using GFA, we selected the following descriptors.
• AlogP98 (Fast Descriptors)[5] : log of the partition coefficient, atom-type value
• Carboxylic Acid (Fragment Descriptors) : the number of the carboxylic acid groups
• Total Dipole (DMol3 Molecular)
• PPSA1 (Jurs Descriptors) : partial positive surface area; the sum of the solvent-accessible surface areas of all positively charged atoms
• Subgraph Counts (1) (Fast Descriptors)[6] : the number of first-order subgraphs in the molecular graph, which is the number of edges that connect the vertices of the molecular graph; the number of bonds in the molecule
• DPSA1 (Jurs Descriptors) : difference in charged partial surface areas; the partial positive surface area minus partial negative surface area
• Total Dipole^2 (DMol3 Molecular) : the square of the Total Dipole (DMol3 Molecular)

• Prediction model
We built a prediction model using artificial neural net machine learning technique. As mentioned above, we trained the model with 178 data points. The neural net structure is 7-2-1 with 2 nodes in one hidden layer. The following is the training results of the neural net model.

For the training set, the R2 of the prediction model is 0.9129. The residuals are well distributed above and below 0.0.

For the test set, the R2 of the prediction model is 0.8992.
##### ANN-QSPR for Prediction of Flash Point

Flash point of the electrolyte is important due to the safety issue. Flash point of the electrolyte is predicted using QSPR model.

• Data collection
The experimental flash point (closed cup method) data of 325 various organic compounds were taken from literature.[7] The molecular set is structurally sufficient diverse and composed of molecules containing C, H, O, N, S, P, Cl and Br.
The total 325 data points were used in the QSPR. We split the data into two groups; one is the training set with 273 data points and the other is the test set with 52 data points.
• Descriptors
• Element count (Atomistic Descriptors) : the number of carbon atoms in the structure
• Hydrogen bond acceptor (Fast Descriptors) : the number of hydrogen-bond acceptors
• Subgraph counts (1): path (Fast Descriptors) )[6] : the number of first-order subgraphs in the molecular graph, which is the number of edges that connect the vertices of the molecular graph; the number of bonds in the molecule
• FNSA3 (Jurs Descriptors) : fractional atomic charge-weighted negative surface area; the atomic charge-weighted negative surface area divided by the total molecular solvent-accessible area
• Hydroxy (Fragment Counts) : the number of the hydroxyl groups
• E-state keys (sums): S_ssO (Fast Descriptors) : electrotopological state keys for oxygen with two single bonds
• Kappa-2 (alpha-modified) (Fast Descriptors) : the molecule graph with "minimal" and "maximal" graphs, Kappa-2 (alpha-modified) encodes the branching.
$\kappa_2 = \frac{(N + \alpha -1)(N + \alpha-2)^2}{P^2}$, where $\alpha = \sum_{i}(\frac {r_1}{r_{sp^3}} - 1)$
• E-state keys (sums): S_tN (Fast Descriptors) : electrotopological state keys for nitrogen with one triple bond

• Prediction model
We built a prediction model using artificial neural net machine leaning technique. As mentioned above, we trained the model with 325 data points. The neural net structure is 8-1-1 with 1 nodes in one hidden layer. The following is the training results of the neural net model.

For the training set, the R2 of the prediction model is 0.8972.

For the test set, the R2 of the prediction model is 0.8704.

#### Random Forest based QSPR

For decision tree based machine learning, we implement Random Forests (RF). RF is one of classifiers, an improved decision tree technique. It utilizes bagging approach, which refers bootstrapping aggregation, to overcome the bias-variance trade off.

Normally a statistician tries to have low bias as well as low variance; but it is essentially not possible. It is very normal that if data analysis shows low bias then variance is high. Bagging samples many subsets (let's say N subsets) randomly and makes N corresponding prediction models for each. With these subset models, RF computes bias and variance for whole dataset. This scheme with large-size data critically ensures accuracy (low bias) and prediction safety (low variance).

RF requires only three parameters essentially: (1) number of decision trees (subsets), (2) number of descriptors, and (3) sub tree size (sampling rate). It first builds normal decision trees. Then every time a split has to made, it tries to organize a small random subset of features rather than making a full set of features. Using these multiple trees (bagging), we take the average of all the trees. Basically Random Forests provides us the clustering results. So its accuracy is evaluated using how it separates the dataset correctly.

As explained, the size of datasets increases, RF can make more subsets. Computation to obtain low bias highly depends on the number of sampled datasets in RF. Since we inject more datasets into our database, the quality of machine learning accuracy will be increased.

We implement RF by utilizing scikit package for python.

##### RF-QSPR for Viscosity

Viscosity of electrolyte solvents has a key role in transport phenomena. It includes conduction of Li-ion and wettability implying its importance in electrolytes.

• Data
As done in the ANN-QSPR example, we use the same data. That are from the published paper. The literature gives experimental viscosity data of 440 various organic compounds [1]. We also collect additional 15 organic compounds used as the solvent for lithium ion battery.
With the 455 compounds, the multiple data at a few temperatures were collected, and the total 1239 data points were used in QSPR. Following Ref.[1], we separate the dataset is into training and test sets.

• Descriptors
We send all the 196 descriptors into the RF model. It does not require any arbitrary defined descriptor, because the decision tree, which is the basic learning algorithm under RF, essentially a classifier providing categorical results of the given datasets.

• Results
• 200 training sets

We have R2 value of 0.8731. This is lower than the ANN case, but shows much better in a view of bias. It implies with a bit larger variance, this model provides rather corrected prediction of viscosity. Using both of the models, RF as well as ANN, we would have better prediction model. A disadvantage of RF, which is a black box model not giving an analytic understanding, is supplemented by ANN prediction model.

## References

[1] T. Suzuki, R.-U. Ebert, and G. Schuurmann, J. Chem. Inf. Comput. Sci. 31, 776-790 (2001).

[2] L.B. Kier, L.H. Hall, Chemometrics Series, Vol. 9, Research Studies Press Ltd., New York (1985).

[3] L.H. Hall, L.B. Kier, J. Chem. Inf. Comput. Sci., 31, 1039-1045 (1995).

[4] J. Jover, R. Bosque, and J. Sales, J. Chem. Inf. Comput. Sci, 44, 1727-1736 (2004).

[5] A. K. Ghose, V. N. Viswanadhan, and J. J. Wendoloski, J. Phys. Chem., 102, 3762-3772 (1998).

[6] L. B. Kier and L. H. Hall, Medicinal Chemistry, Vol. 14, deStevens, G., Ed., Academic Press: New York (1976).

[7] International Chemical Safety Cards, http://www.inchem.org/pages/icsc.html

[8] Leo Breiman Machine Learning (1996).

[9] T. Suzuki, R.U. Ebert and G. Schüürmann, J. Chem. Inf. Comput. Sci., 41, 776-790 (2001).

## Contact

Dr. Dong Hyen Chung, Insilico Co. Ltd.