Sina's Blog

ICML 2025 AI4Science Contribution Report

By Sina on Jul 26, 2025
POST_3

MEDIUM POST

The International Conference on Machine Learning (ICML) 2025 took place in Vancouver from July 13th to 19th. Most of the contributors in this field gathered for a week to present and discuss the state-of-the-art in Artificial Intelligence and Machine Learning across various fields. If we were to categorize the different topics at the conference, the main titles would be Optimization, Algorithms (mostly Reinforcement Learning), LLMs & Transformers, and Applications (in areas like biology, chemistry, classification, diffusion, etc.).

What is AI4Science?

In this article, we will focus on what happened in the field of AI for Science at ICML. First, let’s talk about the term “AI for Science.” It mostly refers to how we can apply data-driven methods like deep learning to the forecasting, modeling, parameter tuning, and controlling of physical phenomena that are often difficult to solve analytically with pure math. This is not a new field; humans have been interpreting the world around them via gathered data and observation since the first civilizations and have tried to find patterns and logical explanations for nature. The novelty here is this recently capable, data-based tool — machine learning — for interpreting these kinds of data and finding sophisticated patterns from a massive amount of observations.

Until now, we’ve seen how ML is good at tasks like natural language processing, image generation, and classification. However, when tasked with, for example, forecasting the motion of water flow based on exact physical features, these models can lack accuracy and fail to follow the precise physics behind the phenomenon. This is where we define the field of AI for Science, or Scientific Machine Learning. Different communities from various departments like mathematics, computer science, natural science, and engineering are conducting interdisciplinary research to leverage machine learning approaches, solve the aforementioned problems, and harness the benefits of AI’s speed, accuracy, and intelligence.

AI4Science at ICML2025

At the very beginning of this section, let’s list all the papers (say, most of them; I may have missed some!) that were presented at the ICML2025:

Paper NameFirst AuthorSubcategory
Linearization Turns Neural Operators into Function-Valued Gaussian ProcessesEmilia MagnaniNeural Operators
A Bregman Proximal Viewpoint on Neural OperatorsAbdel-Rahim MezidiNeural Operators
Maximum Update Parametrization and Zero-Shot Hyperparameter Transfer for Fourier Neural OperatorsShanda LiNeural Operators
Shifting Time: Time-series Forecasting with Khatri-Rao Neural OperatorsSrinath DamaModeling Dynamical Systems & Time-Series
Optimization for Neural Operators can Benefit from WidthPedro Cisneros-VelardeNeural Operators
Accelerating PDE-Constrained Optimization by the Derivative of Neural OperatorsZe ChengNeural Operators
CoPINN: Cognitive Physics-Informed Neural NetworksSiyuan DuanPhysics-Informed Learning (PINNs)
Sub-Sequential Physics-Informed Learning with State Space ModelChenhui XuPhysics-Informed Learning (PINNs)
Causal-PIK: Causality-based Physical Reasoning with a Physics-Informed KernelCarlota Parés MorlansPhysics-Informed Learning (PINNs)
Physics-Informed DeepONets for drift-diffusion on metric graphs: simulation and parameter identificationJan BlechschmidtPhysics-Informed Learning (PINNs)
Refined generalization analysis of the Deep Ritz Method and Physics-Informed Neural NetworksXianliang XuPhysics-Informed Learning (PINNs)
Physics-Informed Generative Modeling of Wireless ChannelsBenedikt BöckPhysics-Informed Learning (PINNs)
Physics-informed Temporal Alignment for Auto-regressive PDE Foundation ModelsCongcong ZhuPhysics-Informed Learning (PINNs)
Physics-Informed Weakly Supervised Learning For Interatomic PotentialsMakoto TakamotoPhysics-Informed Learning (PINNs)
Calibrated Physics-Informed Uncertainty QuantificationVignesh GopakumarPhysics-Informed Learning (PINNs)
Near-optimal Sketchy Natural Gradients for Physics-Informed Neural NetworksMaricela Best MckayPhysics-Informed Learning (PINNs)
A Physics-Informed Machine Learning Framework for Safe and Optimal Control of Autonomous SystemsManan TayalPhysics-Informed Learning (PINNs)
Geometric and Physical Constraints Synergistically Enhance Neural PDE SurrogatesYunfei HuangPhysics-Informed Learning (PINNs)
PDE-Transformer: Efficient and Versatile Transformers for Physics SimulationsBenjamin HolzschuhFoundation Models & Large-Scale Architectures
Neural Interpretable PDEs: Harmonizing Fourier Insights with Attention for Scalable and Interpretable Physics DiscoveryNing LiuAutomated Scientific Discovery
From Uncertain to Safe: Conformal Adaptation of Diffusion Models for Safe PDE ControlPeiyan HuFoundation Models & Large-Scale Architectures
Holistic Physics Solver: Learning PDEs in a Unified Spectral-Physical SpaceXihang YueNeural Operators
Unisolver: PDE-Conditional Transformers Towards Universal Neural PDE SolversHang ZhouFoundation Models & Large-Scale Architectures
MultiPDENet: PDE-embedded Learning with Multi-time-stepping for Accelerated Flow SimulationQi WangPhysics-Informed Learning (PINNs)
Toward Efficient Kernel-Based Solvers for Nonlinear PDEsZhitong XuNeural Operators
Active Learning with Selective Time-Step Acquisition for PDEsYegon KimPhysics-Informed Learning (PINNs)
PINNsAgent: Automated PDE Surrogation with Large Language ModelsQingpo WuwuFoundation Models & Large-Scale Architectures
PDE-Controller: LLMs for Autoformalization and Reasoning of PDEsMauricio SorocoFoundation Models & Large-Scale Architectures
M2PDE: Compositional Generative Multiphysics and Multi-component PDE SimulationTao ZhangPhysics-Informed Learning (PINNs)
Transolver++: An Accurate Neural Solver for PDEs on Million-Scale GeometriesHUAKUN LUOFoundation Models & Large-Scale Architectures
Zebra: In-Context Generative Pretraining for Solving Parametric PDEsLouis SerranoFoundation Models & Large-Scale Architectures
Mechanistic PDE Networks for Discovery of Governing EquationsAdeel PervezAutomated Scientific Discovery
Curvature-aware Graph Attention for PDEs on ManifoldsYunfeng LiaoPhysics-Informed Learning (PINNs)
PEINR: A Physics-enhanced Implicit Neural Representation for High-Fidelity Flow Field ReconstructionLiming ShenPhysics-Informed Learning (PINNs)
Efficient and Scalable Density Functional Theory Hamiltonian Prediction through Adaptive SparsityErpai LuoPhysics-Informed Learning (PINNs)
Neural Discovery in Mathematics: Do Machines Dream of Colored Planes?Konrad MundingerAutomated Scientific Discovery
QuanONet: Quantum Neural Operator with Application to Differential EquationRuocheng WangNeural Operators
Discovering Physics Laws of Dynamical Systems via Invariant Function LearningShurui GuiAutomated Scientific Discovery
Inverse problems with experiment-guided AlphaFoldSai Advaith MaddipatlaAutomated Scientific Discovery
Discovering Physics Laws of Dynamical Systems via Invariant Function LearningShurui GuiAutomated Scientific Discovery
Symmetry-Driven Discovery of Dynamical Variables in Molecular SimulationsJeet MohapatraAutomated Scientific Discovery
Skip the Equations: Learning Behavior of Personalized Dynamical Systems Directly From DataKrzysztof KacprzykModeling Dynamical Systems & Time-Series
Chaos Meets Attention: Transformers for Large-Scale Dynamical PredictionYi HeModeling Dynamical Systems & Time-Series
Transformative or Conservative? Conservation laws for ResNets and TransformersSibylle MarcotteFoundation Models & Large-Scale Architectures
Thermalizer: Stable autoregressive neural emulation of spatiotemporal chaosChris PedersenModeling Dynamical Systems & Time-Series

Now, let’s talk about some papers that looked like a good contribution in this field and explain their work:

Thermalizer: Stable autoregressive neural emulation of spatiotemporal chaos [1]

Thermalizer visualization

This paper mostly focused on the inference time, where we want to make our forecasting, but there exists a major problem called cumulative error. This mostly happens on autoregressive prediction. Thermalizer is a new approach that tries to train a secondary model, alongside the main model, by learning dynamical features of the system, and keeps the predictions in the right direction based on the physics of the problem. It helps to correct errors and back the prediction to the right trajectory, after making our prediction steps. Find more at arxiv.

Chaos Meets Attention: Transformers for Large-Scale Dynamical Prediction [2]

Thermalizer visualization

This paper introduces a new deep learning framework that leverages the power of Transformer architectures to perform large-scale, long-term forecasting of chaotic dynamical systems. The authors propose a method that combines a parallel-in-time, auto-regressive training scheme with a novel attention mechanism designed to capture the complex spatiotemporal dependencies inherent in chaotic behaviour. Their model aims to overcome the limitations of recurrent neural networks (RNNs) and other approaches, which often struggle with error accumulation during prediction. Find more at arxiv.

Zebra: In-Context Generative Pretraining for Solving Parametric PDEs [3]

Thermalizer visualization

This is a new kind of approach to leverage the in-context learning in natural language processing, but now in PDEs and dynamical systems prediction. Here we train a transformer model with a few examples of a specific PDE (e.g. Navier-Stokes), then try to predict the next steps of a trajectory, by providing the first few steps of it. This paper tries to focus on interpolation and in-bound prediction, based on systems that the model was trained on, but for future work, it could be applied to out-bound prediction, or even train a foundational model for different PDEs and have a multi-system surrogate. Find more at arxiv.

Provable Length Generalization in Sequence Prediction via Spectral Filtering [4]

Thermalizer visualization

A great contribution to solving linear dynamical systems via data-driven methods for learning a spectral filter for the system, to gradient-based learning and achieving length generalization for linear dynamical systems. This paper also introduces “tensorized spectral filter” and provides algorithms that provably generalize. During the presentation, most people asked about the relationship between this method and the Koopman Operator method. The Koopman operator method learns a global model of the system’s underlying dynamics, while this paper’s spectral filtering method learns a direct predictive function for the output sequence by minimizing regret, without needing to model the entire system. Find more at arxiv.

A Bregman Proximal Viewpoint on Neural Operators [5]

Thermalizer visualization

In this paper, we see one step forward in the theory of neural operators. A. Mezidi et al. introduced a new component at each layer (from a practical point of view), which is the inverse of the activation function, which adds an extra non-linearity to the learning process. Theoretically, this method connects activation functions to Bregman proximity operators, where the specific activation is determined by the choice of regularization. This method resulted in improved performance in deeper models by allowing layers to more easily reduce to an identity mapping, and ending up with a more sparse weight set. Find more at paper.

Shifting Time: Time-series Forecasting with Khatri-Rao Neural Operators [6]

Thermalizer visualization

A good contribution from the University of Toronto in forecasting that learns a continuous time-shift operator to map a system’s history to its future values. Authors introduce a new architecture, Khatri-Rao Neural Operators (KRNOs), which is nearly linear computational scaling. This framework naturally handles irregularly sampled observations and enables super-resolution forecasting in both space and time. Find more at paper.

SAND: One-Shot Feature Selection with Additive Noise Distortion [7]

This is a little bit off topic and a more general contribution, but we found it helpful. SAND (Selection with Additive Noise Distortion) is a simple feature selection method which tries to select the most effective features at every start of the learning workflow. In addition could be placed at any other points of the flow as well. Its main contribution is a non-intrusive input layer that applies trainable gains and weighted Gaussian noise to each feature, governed by a normalization constraint that forces the sum of squared gains to equal the desired number of features, k. Find more at arxiv.

References

[1] Pedersen, Chris, Laure Zanna, and Joan Bruna. “Thermalizer: Stable autoregressive neural emulation of spatiotemporal chaos.” arXiv preprint arXiv:2503.18731 (2025).

[2] He, Yi, et al. “Chaos meets attention: Transformers for large-scale dynamical prediction.” arXiv preprint arXiv:2504.20858 (2025).

[3] Serrano, Louis, et al. “Zebra: In-context and generative pretraining for solving parametric pdes.” arXiv preprint arXiv:2410.03437 (2024).

[4] Marsden, Annie, et al. “Provable Length Generalization in Sequence Prediction via Spectral Filtering.” arXiv preprint arXiv:2411.01035 (2024).

[5] Mezidi, Abdel-Rahim, et al. “A Bregman Proximal Viewpoint on Neural Operators.” International Conference on Machine Learning. 2025.

[6] Dama, Srinath, Kevin Course, and Prasanth B. Nair. “Shifting Time: Time-series Forecasting with Khatri-Rao Neural Operators.” Forty-second International Conference on Machine Learning.

[7] Pad, Pedram, et al. “SAND: One-Shot Feature Selection with Additive Noise Distortion.” arXiv preprint arXiv:2505.03923 (2025).

© Copyright 2025 by Saman Pordanesh. Built with ♥ by CreativeDesignsGuru.