MRSEC researchers are working together to integrate machine learning and other informatics tools with materials science while simultaneously providing project-based hands-on learning for undergraduates in cutting edge interdisciplinary science. The “Informatics Skunkworks” (http://skunkworks.engr.wisc.edu/) initiated by MRSEC researcher Prof. Dane Morgan, is a group dedicated to realizing the potential of informatics for science and engineering. Computers can now do many things we thought of as reserved for humans just a few years ago, including drive cars, answer phones, translate language, read and write articles, and beat us at Jeopardy, chess, and Go. A transformative moment will come when computers can work with us to intelligently perform science and engineering tasks, allowing us to accelerate discovery and technology development to computer speeds. As a first step in that direction, informatics algorithms are increasingly changing how we deal with data in science and engineering. The skunkworks provides a environment where undergraduates can engage in the development of informatics for science and engineering area together, learning advanced data science / machine learning skills, expanding their domain specific knowledge, and growing their experience of working with teams, faculty, and industry. The group now has over 15 participating undergraduates, five faculty, and materials projects that include problems from across the country in both academia and industry. Projects range from predicting ductile to brittle transition temperatures in nuclear steels to developing novel glass forming Al alloys to exploring improved steels for commercial water heaters.
As a concrete example, Prof. Morgan has been working with a team in the skunkworks to predict impurity diffusion coefficients. Working from a database of over 300 diffusion coefficients calculated using accurate but computationally demanding ab initio density functional methods (see http://diffusiondata.materialshub.org) , the team has used machine learning tools to train computers to predict impurity migration barriers in a host in terms of the known properties of the impurity and host atom. For instance, the Figure below shows the migration barriers of 13 impurities in an Al host, including those calculated by accurate atomistic modeling (actual barriers) and those predicted by multiple machine learning algorithms. The researchers found that the Gaussian Kernel Ridge Regression (GKRR) method gives the best overall predictive ability, and in tests on data not included in the fitting (cross-validation tests) the GKRR method can predict activation energies within 0.15 eV, which is close to the uncertainty in both the computed and experimental data. These fits can be used to predict activation energies for hundreds of new alloys, saving millions of dollars and years worth of human and computing time compared to traditional methods of obtaining these values.
Comparison of Ab Initio calculated and machine learning predicted diffusion coefficients.Machine learning methods include Linear Regression (LR), Decision Tree (DT), Artificial Neural Network (ANN), and Gaussian Kernel Ridge Regression (GKRR).
 H. Wu, T. Mayeshiba, and D. Morgan, High-Throughput ab-initio Dilute Solute Diffusion Database, To be published in Scientific Data (2016).