Planetary Atmospheric Spectrum Retrieval based on Machine Learning

November 2024 - December 2024

Course Project, School of Physics, Peking University, Beijing, China

This project was a course project for the course “Numerical Simulation and AI Forecast of Geophysical Fluids”, with the lecturers being Prof. Qiu Yang and Prof. Xinyu Wen.

Slides (in Mandarin for class presentation)

Motivation

It is in principle easy to calculate the transmission and emission spectrum given the atmospheric profile and components based on the theory of radiative transfer. However, the inverse process, which is to retrieve the atmospheric thermodynamic structure and compositions from observed spectrum, turns out to be way harder, as the relation between spectrum and atmospheric status is highly nonlinear. The mainstream approach to address the retrieval problem is the Markov Chain Monte-Carlo (MCMC). As this method requires to calculate the spectrum at every step, it is extremely time consuming and takes a lot of computational resources. AI is apparently an effective way to deal with nonlinear relations, thus it is interesting to utilize the ability of AI to address this problem.

Method

The method can be devided into two parts, i.e. the data preparation and the model training.

Data Preparation

In order to train AI models properly, a dataset with high quality and large number of samples is needed. However, it is difficult to find such dataset. Thus, I decided to generate my own dataset for subsequent AI training process.

The radiative transfer model I chose was pyratbay, a python tool to compute radiative-transfer spectra and fit exoplanet atmospheric properties.

To generate a single set of data with pyratbay, a sequence of operation had to be carried out, including assigning the configuration file for the atmospheric thermal structure and composition, and the configuration of the spectra (type, resolution, etc.).

Fig. 1. Madhu profile

For the atmosphere profile, I used the Madhu profile (Madhusudhan & Seager, 2009) that is inherent in the model (because apparently using a isothermal profile wil give you the trivial blackbody spectrum). For the atmospheric composition, I selected 18 gases, and defined 8 of them as major gases, and the other 10 as minor gases. In summary, there are 24 parameters for each set of data:

  • 6 parameters from Madhu profile
  • 8 major gases concentration: $H_2O, CO_2, N_2, O_2, CH_4, He, H_2, NH_3$
  • 10 minor gases concentration: $O_3, PH_3, CO, SO_2, HCN, H_2S, NO, N_2O, HCl, C_2H_2$

I build a pipeline, utilizing python scripts and shell scripts to generate batches of data. I generated 10800 sets of data in total.

Model Training

I tried 6 models in total: Linear Regression, Random Forest, MLP, 1D CNN, 2D CNN, Transformer.