Vib2Mol: from vibrational spectra to molecular structures-a versatile deep learning model

Xinyu Lu^1,2, Hao Ma^1,*, Hui Li³, Jia Li⁴, Yuqiang Li^2,5, Tong Zhu^2,6, Guokun Liu^7,*, Bin Ren^1,2,*

¹College of Chemistry and Chemical Engineering, Xiamen University ²Shanghai Innovation Institude ³School of Informatics, Xiamen University ⁴Institute of Artificial Intelligence, Xiamen University ⁵Shanghai Artificial Intelligence Laboratory ⁶School of Chemistry and Molecular Engineering, East China Normal University ⁷College of the Environment and Ecology, Xiamen University

^*Corresponding authors

Paper Code Dataset Checkpoints

Abstract

There will be a paradigm shift in chemical and biological research, to be enabled by autonomous, closed-loop, real-time self-directed decision-making experimentation. Spectrum-to-structure correlation, which is to elucidate molecular structures with spectral information, is the core step in understanding the experimental results and to close the loop. However, current approaches usually divide the task into either database-dependent retrieval and database-independent generation and neglect the inherent complementarity between them. In this study, we proposed Vib2Mol, a versatile deep learning model designed to flexibly handle diverse spectrum-to-structure tasks according to the available prior knowledge by bridging the retrieval and generation. It not only achieves state-of-the-art performance in analyzing theoretical Infrared and Raman spectra, but also outperform previous models at experimental data. Moreover, Vib2Mol demonstrates promising capabilities in predicting reaction products and sequencing peptides, enabling vibrational spectroscopy a real-time guide for autonomous scientific discovery workflows.

Training framework of Vib2Mol

Figure 1. The framework of Vib2Mol for pretraining. (A) The alignment phase: spectra and molecular structures are represented as patch tokens and SMILES tokens, respectively. After processed by their encoders, spectral and molecular information are aligned by CL. Subsequently, hard negative samples are selected and employed to guide model in learning the subtle distinctions between these highly similar spectra or molecule samples. (B) The generation phase: for conditional generation, molecules are randomly masked 45% and encoded by the same molecular encoder used for spectrum-structure alignment. The molecular decoder fuses spectral information with molecular features and predicts masked tokens. For de novo generation, molecule is sequentially masked and directed input into the same molecular decoder as conditional generation without the prior encoding. Then, the decoder predicts the next token on the basis of previous information, spectral features and chemical formulae (if given).

Inference workflow of Vib2Mol

Figure 2. The workflow of Vib2Mol for addressing different spectrum-to-structure tasks: (A) spectrum-spectrum retrieval, where only the spectral encoder is used to calculate the similarity between spectral pairs; (B) spectrum-structure retrieval, where spectra and molecules are encoded by their respective encoders to determine spectrum-structure similarity; (C) conditional generation, and (D) de novo generation, both following workflows during the stage of pretraining. (E) re-ranking module for refining retrieval and generation results. It initially filters candidates by chemical formula (if available), then uses a pre-trained molecular encoder to score them against the query spectrum. High-scoring candidates are finally selected as output.

SOTA performance on different benchmarks

Figure 3. Performance evaluation of advanced deep learning models. (A) and (B) present a performance comparison of various models on spectrum-to-structure retrieval and de novo molecular generation, respectively. These evaluations were conducted on both theoretical (VB, QM9S) and experimental (NIST, SDBS) benchmarks. The impact of multi-modal spectral input on performance of Vib2Mol is further detailed in (C) for retrieval and (D) for generation.

Further applications on product prediction of chemical reactions

Figure 4. Workflow and performance of Vib2Mol in product elucidation and mixed-spectrum analysis. (A) Three scenarios for predicting products. (B) Benchmarking on PAH substitution reactions. (C) Retrieval and de novo generation results on unmixed and mixed spectra of general chemical reactions.

Citation

If you find our work useful, please consider citing our paper:

@article{lu2025vib2mol,
title={Vib2Mol: from vibrational spectra to molecular structures-a versatile deep learning model},
author={Xinyu Lu, Hao Ma, Hui Li, Jia Li, Yuqiang Li, Tong Zhu, Guokun Liu, Bin Ren},
year={2025},
url={https://arxiv.org/abs/2503.07014}, }

This work was supported by the National Natural Science Foundation (Grant No: 22227802, 22021001, 22474117 and 22272139) of China and the Fundamental Research Funds for the Central Universities (20720220009 and 20720250005) and Shanghai Innovation Institute.