Machine learning in materials informatics: recent applications and prospects(材料信息学中的机器学习:最新应用与前景)
Rampi Ramprasad,Rohit Batra,Ghanshyam Pilania,Arun Mannodi-Kanakkithodi&Chiho Kim
npj Computational Materials 3:54 (2017)
doi:10.1038/s41524-017-0056-5
Published online:13 December 2017
Abstract| Full Text | PDF OPEN
摘要:受材料基因组计划、算法发展和数据驱动的研究在其他领域取得巨大成功的推动,材料科学研究中的信息学方法已逐渐成形。该方法采用机器学习模型,仅依赖已有的数据便可快速做出预测,不需通过直接的实验以及求解基本方程来计算/模拟。对于难以用传统方法测量或计算的材料性能研究(因受传统方法的人力物力成本所限),以数据为中心的材料信息学方法会十分有效,前提是已经存在相关材料的可靠数据或是可根据一些关键事例生成出部分密切相关的数据。这些预测通常是内插的(interpolative),即首先从数值上赋予材料“指纹”,然后通过学习算法来建立材料“指纹”与其性能的关系。“指纹”,也称作“描述符”,可有多种类型和多个尺度,可由应用领域和需求来决定。在对预测的不确定性已有充分考虑的前提下,预测也可以是外推的,即延伸到新的材料空间。本文尝试对过去十年间基于数据驱动的“材料信息学”成功的策略进行综述,特别强调了指纹或描述符的选择。综述还指出了该领域所面临的一些挑战以及近期要克服的困难。
Abstract:Propelled partly by the Materials Genome Initiative, and partly by the algorithmic developments and the resounding successes of data-driven efforts in other domains, informatics strategies are beginning to take shape within materials science. These approaches lead to surrogate machine learning models that enable rapid predictions based purely on past data rather than by direct experimentation or by computations/simulations in which fundamental equations are explicitly solved. Data-centric informatics methods are becoming useful to determine material properties that are hard to measure or compute using traditional methods—due to the cost, time or effort involved—but for which reliable data either already exists or can be generated for at least a subset of the critical cases. Predictions are typically interpolative, involving fingerprinting a material numerically first, and then following a mapping (established via a learning algorithm) between the fingerprint and the property of interest. Fingerprints, also referred to as “descriptors”, may be of many types and scales, as dictated by the application domain and needs.Predictions may also be extrapolative—extending into new materials spaces—provided prediction uncertainties are properly taken into account. This article attempts to provide an overview of some of the recent successful data-driven “materials informatics” strategies undertaken in the last decade, with particular emphasis on the fingerprint or descriptor choices. The review also identifies some challenges the community is facing and those that should be overcome in the near future.