首 页
滚动信息 更多 >>
本刊2022年SCI影响因子9.7 (2023年6月发布) (2023-10-23)
本刊2021年SCI影响因子12.256 (2022-07-07)
npj Computational Materials 2019年影响因子达到9... (2020-07-04)
npj Computational Materials获得第一个SCI影响因... (2018-09-07)
英文刊《npj Computational Materials(计算材料学... (2017-05-15)
快捷服务
最新文章 研究综述
过刊浏览 作者须知
期刊编辑 审稿须知
相关链接
· 在线投稿
会议信息
友情链接
  中国科学院上海硅酸盐研究所
  无机材料学报
  OQMD数据库
近期文章
Sizing up feature descriptors for macromolecular machine learning with polymeric biomaterials
发布时间:2023-08-21

Sizing up feature descriptors for macromolecular machine learning with polymeric biomaterials

    Samantha Stuart, Jeffrey Watchorn & Frank X. Gu     
 

    npj Computational Materials 9: 102(2023)
   doi.org/10.1038/s41524-023-01040-5
    Published online: 12 June 2023
   AbstractFull Text | PDF OPEN
  
  

Abstract: It has proved challenging to represent the behavior of polymeric macromolecules as machine learning features for biomaterial interaction prediction. There are several approaches to this representation, yet no consensus for a universal representational framework, in part due to the sensitivity of biomacromolecular interactions to polymer properties. To help navigate the process of feature engineering, we provide an overview of popular classes of data representations for polymeric biomaterial machine learning while discussing their merits and limitations. Generally, increasing the accessibility of polymeric biomaterial feature engineering knowledge will contribute to the goal of accelerating clinical translation from biomaterials discovery.
摘要:  将聚合物大分子的行为表示为预测生物材料相互作用的机器学习特征极具挑战。虽然目前已有多种方法来实现这一表示,但对于一个普适的表示框架尚未达成共识,其中一部分原因来自生物大分子相互作用对聚合物性质的敏感性。为帮助指导特征工程过程,我们概述了聚合物生物材料机器学习中流行的数据表示类,同时讨论了它们的优点和局限性。一般来说,聚合物生物材料特征工程知识的增加,将有助于加速生物材料发现的临床转化。
Editorial Summary

Macromolecular data representation: Polymeric biomaterial feature engineering

The selection of feature descriptors to encode a dataset for machine learning is one of the most important decisions underlying model quality. Small molecules, as a function of their constrained sizes and structure, can be represented as standardized numeric descriptors for simulation, molecular property prediction, and virtual screening. The ability to encode small molecules numerically in part provided an essential foundation for the chemoinformatics domain to achieve data-driven research success in small molecule drug discovery. Inspired by small molecule success, machine learning frameworks for studying polymers often use feature descriptors based on the attributes of drug-like small molecules. The intrinsic limitation of applying small-molecule-based feature representations to biomaterials is that small molecule descriptors lack the ability to accommodate the heterogeneity of polymer properties. Further, alterations in these macromolecular properties can yield significant changes in predictive target outcomes. Therefore, there is a clear need for dedicated macromolecular descriptors that facilitate the training of predictive models in this domain. However, it has proved challenging to represent the behavior of polymeric macromolecules as machine learning features for biomaterial interaction prediction. In this work, Samantha Stuart et al. from the Institute of Biomedical Engineering, University of Toronto, provided an overview of different classes of macromolecular data representations applicable to polymeric biomaterial machine learning frameworks and discussed their merits and limitations. The authors focused their discussions on the four most popular classes of macromolecular representation applicable to polymer and biomaterials research: domain-specific descriptors, molecular fingerprints, string descriptors, and graph descriptors. Throughout this review, they highlighted examples of research applying polymer data representations that can contribute to achieving predictive biomaterial design; such that polymers and biopolymers can be proactively selected for use in a biomaterial to achieve targeted biological outcomes. This work will benefit researchers seeking greater technical context for predictive polymer biomaterial design, as well as researchers in computer science seeking greater domain context when building predictive models of large polymer systems for biomaterials engineering.
大分子数据表示:聚合物生物材料特征工程

选择特征描述符来编码机器学习数据集是决定模型质量的最重要因素之一。小分子作为其有限尺寸和结构的函数,可以表示为标准化的数值描述符,用于模拟、分子性质预测和虚拟筛选。小分子数值编码的能力一定程度上为化学信息学领域在小分子药物发现过程中实现数据驱动研究提供了重要的基础。受小分子成功的启发,研究聚合物的机器学习框架通常使用基于药物状小分子属性的特征描述符。这一做法的内在局限性在于:小分子描述符缺乏适应聚合物性质多样性的能力,而这些大分子性质的改变将导致所预测的目标结果发生显著变化。因此,需要专门的大分子描述符来促进该领域中预测模型的训练。然而,将聚合物大分子的行为表示为预测生物材料相互作用的机器学习特征极具挑战。在本工作中,来自多伦多大学生物医学工程研究所的Samantha Stuart等人,概述了适用于聚合物生物材料机器学习框架的不同种类大分子数据表示,同时讨论了它们的优点和局限性。研究者重点关注了领域特定描述符、分子指纹、字符串描述符和图描述符这四种适用于聚合物和生物材料研究的最流行的大分子数据表示类。在该综述中,他们强调了使用聚合物数据表示实现预测性生物材料设计的研究实例。通过这种方式,能够主动选择聚合物以及生物聚合物并应用于生物材料,以实现目标生物结果。本工作将有利于研究人员在预测聚合物生物材料设计时寻求更为广阔的技术背景,并且有利于计算机科学领域的研究人员,在为生物材料工程建立大型聚合物系统预测模型的过程中寻求更为广阔的领域背景。

 
【打印本页】【关闭本页】
版权所有 © 中国科学院上海硅酸盐研究所  沪ICP备05005480号-1    沪公网安备 31010502006565号
地址:上海市长宁区定西路1295号 邮政编码:200050