当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Prediction of chemical reaction yields with large-scale multi-view pre-training
Journal of Cheminformatics ( IF 8.6 ) Pub Date : 2024-02-25 , DOI: 10.1186/s13321-024-00815-2
Runhan Shi , Gufeng Yu , Xiaohong Huo , Yang Yang

Developing machine learning models with high generalization capability for predicting chemical reaction yields is of significant interest and importance. The efficacy of such models depends heavily on the representation of chemical reactions, which has commonly been learned from SMILES or graphs of molecules using deep neural networks. However, the progression of chemical reactions is inherently determined by the molecular 3D geometric properties, which have been recently highlighted as crucial features in accurately predicting molecular properties and chemical reactions. Additionally, large-scale pre-training has been shown to be essential in enhancing the generalization capability of complex deep learning models. Based on these considerations, we propose the Reaction Multi-View Pre-training (ReaMVP) framework, which leverages self-supervised learning techniques and a two-stage pre-training strategy to predict chemical reaction yields. By incorporating multi-view learning with 3D geometric information, ReaMVP achieves state-of-the-art performance on two benchmark datasets. Notably, the experimental results indicate that ReaMVP has a significant advantage in predicting out-of-sample data, suggesting an enhanced generalization ability to predict new reactions. Scientific Contribution: This study presents the ReaMVP framework, which improves the generalization capability of machine learning models for predicting chemical reaction yields. By integrating sequential and geometric views and leveraging self-supervised learning techniques with a two-stage pre-training strategy, ReaMVP achieves state-of-the-art performance on benchmark datasets. The framework demonstrates superior predictive ability for out-of-sample data and enhances the prediction of new reactions.

中文翻译:

通过大规模多视图预训练预测化学反应产率

开发具有高泛化能力的机器学习模型来预测化学反应产率具有重大意义和重要性。此类模型的功效在很大程度上取决于化学反应的表示,这通常是使用深度神经网络从 SMILES 或分子图中学习的。然而,化学反应的进程本质上是由分子 3D 几何特性决定的,最近这些特性被强调为准确预测分子特性和化学反应的关键特征。此外,大规模预训练已被证明对于增强复杂深度学习模型的泛化能力至关重要。基于这些考虑,我们提出了反应多视图预训练(ReaMVP)框架,该框架利用自监督学习技术和两阶段预训练策略来预测化学反应产率。通过将多视图学习与 3D 几何信息相结合,ReaMVP 在两个基准数据集上实现了最先进的性能。值得注意的是,实验结果表明,ReaMVP 在预测样本外数据方面具有显着优势,表明预测新反应的泛化能力增强。科学贡献:本研究提出了 ReaMVP 框架,该框架提高了机器学习模型预测化学反应产率的泛化能力。通过集成顺序视图和几何视图,并利用自监督学习技术和两阶段预训练策略,ReaMVP 在基准数据集上实现了最先进的性能。该框架展示了对样本外数据的卓越预测能力,并增强了对新反应的预测。
更新日期:2024-02-26
down
wechat
bug