Artificial Intelligence at the Service of Astrochemistry

https://images.squarespace-cdn.com/content/649b4c4df57a68563f3f3e90/9bf0b803-0cb2-4e1c-ba34-1686cb21321a/Kessler_headshot.PNG?content-type=image%2FpngArtificial intelligence can help detect the precursors of life, as concluded by research done at the Laboratoire d’Astrophysique de Bordeaux. This conclusion has led to a publication in the journal Astronomy & Astrophysics on the usage of artificial neural networks to study molecular content in the interstellar medium, or the gas and other material filling the space between galaxies and star systems (Kessler et al. 2025).

Artificial intelligence (AI) is widely known for its use in generative models to produce text or images; however, in recent years, it is increasingly used and exploited in astrophysics to speed up the analysis of large amounts of data. But how can this technology help us study the chemistry of the interstellar medium? Artificial neural networks (ANNs), a subclass of AI, are inspired by the brain’s neuronal network. Such a network can statistically find relationships between input and output data to constrain a model that meets our needs. Therefore, like a student, an ANN can learn through trial and error, slowly mastering a task until it achieves a satisfactory level of performance. This is done by adjusting connection weights between the neurons to minimize an error. After the training, we obtain an ANN-model capable of producing a prediction on given data, based on the developed relations.

Hot protostellar envelopes around forming stars are chemically rich astronomical sources that are studied to understand the origin and evolution of their physical and chemical properties. These sources are observed using state-of-the-art telescopes, such as Atacama Large Millimeter/submillimeter Array (ALMA) and NOrthern Extended Millimeter Array (NOEMA), that are powerful enough to retrieve small-scale structures and signals from low-abundant molecular species. Molecules in space emit light at very specific frequencies, producing characteristic emission lines that appear in a spectrum, and are used to identify the chemical composition of their emitting source (see left panel of Figure 1). It has been observed that hot protostellar envelopes are among the objects displaying the most complex species, such as C3H7CN detected near the large molecular cloud Sagittarius B2(N) (Belloche et al. 2013) or (CH2OH)2, detected toward the nebula Orion-KL (Jorgensen et al. 2020). Therefore, these regions could potentially host precursors of life like amino acids (for example, glycine). Still, attributing all the emission lines to known—and sometimes unknown—species by analyzing wide-band high-resolution spectra can be tedious work. So, what if AI could do this work for us?

To do so, one needs an important ingredient: data to be learned. This training data will fully determine how the ANN-model will react when it sees a spectrum. Unfortunately, there is currently still not enough fully analyzed spectra from hot protostellar envelopes to correctly constrain the detection task. We therefore need to find another way to do it—by reproducing as accurately as possible what we observe using the available means. To achieve this, the emission of twenty selected  molecules was modelled using software that solves the radiative transfer equations, which describe how radiation propagates through and interacts with matter in space, in order to produce synthetic emission spectra. These spectra are then combined with  simulated observational artefacts in order to obtain a set of synthetic spectra that mimic those observed towards hot protostellar envelopes. This dataset serves to train an ANN that has been fully developed for a molecular detection task and fine-tuned to achieve optimal precision. This allows us to obtain a detector that provides an insight into the molecular content of observational spectra quickly and systematically. Now, in a matter of seconds, anybody can have a list of molecules and their probability of being present in a spectrum, thus automating the way rich sources can be characterized and studied (see Figure 1). Future developments will maximize the capabilities of this detector even further, fine-tuning it to scrape information and detect signals of rare molecules…and perhaps even the precursors of life.

 

Figure 1. Scheme of the ANN composed of convolutional and dense layers. The input data is an example of a synthetic spectrum. Convolutional layers extract relevant information which is then used by the dense layers to give a prediction. The output is a list of scores between 0 and 1 independent between each other and representative of the presence likelihood for each molecule.

 
 

This article made use of the following publications:

  • Kessler, N. et al. 2025, A&A, 704, A324.
  • Belloche, A. et al. 2013, A&A, 559, A47.
  • Jørgensen, J.K. et al. 2020, ARA&A, 58, 727.
  • Original Contributor

    Dr. Nina Kessler

    Laboratoire d'Astrophysique de Bordeaux

    Nina completed her PhD at the Laboratoire d'Astrophysique de Bordeaux and is interested in the physical and chemical processes involved in star-forming regions. A significant part of her work concentrates on developing machine learning methods to facilitate the analysis of large volumes of observational data.

    Editors

    Katy Wetzel

    Science Editor

    Annika Geiger

    Senior Editor

    Mélisse Bonfand

    Science Editor

    Mélisse Bonfand

    Origins Postdoctoral Fellow at The University of Virginia.

    Next
    Next

    Bridging the Mass Gap: Chemical Complexity from Low- to High-Mass Star Formation