Vectorized Conditional Neural Fields: A Framework for Solving Time-dependent Parametric Partial Differential Equations

Jan Hagnberger, Marimuthu Kalimuthu^{1, 2}, Daniel Muesekamp², Mathias Niepert^{1, 2}

Machine Learning and Simulation Lab, University of Stuttgart, Germany
¹Stuttgart Center for Simulation Science (SimTech) ²International Max Planck Research School for Intelligent Systems (IMPRS-IS)

Published at the International Conference for Machine Learning (ICML) 2024, Vienna, Austria

Paper Poster Code arXiv

Predictions for the density channel of FNO and VCNeF for 2D Compressible Navier-Stokes (CNS) with a spatial resolution of 64 x 64.

Abstract

Transformer models are increasingly used for solving Partial Differential Equations (PDEs). Several adaptations have been proposed, all of which suffer from the typical problems of Transformers, such as quadratic memory and time complexity. Furthermore, all prevalent architectures for PDE solving lack at least one of several desirable properties of an ideal surrogate model, such as (i) generalization to PDE parameters not seen during training, (ii) spatial and temporal zero-shot super-resolution, (iii) continuous temporal extrapolation, (iv) support for 1D, 2D, and 3D PDEs, and (v) efficient inference for longer temporal rollouts. To address these limitations, we propose Vectorized Conditional Neural Fields (VCNeFs), which represent the solution of time-dependent PDEs as neural fields. Contrary to prior methods, however, VCNeFs compute, for a set of multiple spatio-temporal query points, their solutions in parallel and model their dependencies through attention mechanisms. Moreover, VCNeF can condition the neural field on both the initial conditions and the parameters of the PDEs. An extensive set of experiments demonstrates that VCNeFs are competitive with and often outperform existing ML-based surrogate models.

Introduction

The simulation of physical systems such as weather forecasting and fluid dynamics relies on solving Partial Differential Equations (PDEs). Nowadays, Machine Learning (ML) is increasingly used for solving PDEs due to several advantages compared to classical numerical solvers. For instance, ML surrogate models enable faster simulation times than classical numerical PDE solvers, are differentiable, and they are able to be used even when the underlying PDEs are not known exactly. However, if knowledge about the PDEs is available, it can be added to the model as in Physics-Informed Neural Networks (PINNs). The following figure shows a neural surrogate for solving PDEs that takes the initial value (solution for time \(t = 0\)) as input and predicts the the solutions for future times. Intuitively, the problem of solving PDEs can be interpreted as an image-to-video prediction task.

Partial Differential Equations

Partial Differential Equations (PDEs) are equations that relate a function with their partial derivatives. Solving PDEs describes the process of finding (an approximation of) the function that satisfies the equation described by the PDE and additional constraints such as the initial condition and boundary condition. In our work, we focus on time-dependent PDEs that contain a temporal coordinate which describes how the function evolves over time. The following example shows the 1D Burger's equation that models diffusive waves in fluid dynamics over the time.

\[ \partial_t u(t, \boldsymbol{x}) + u(t, \boldsymbol{x}) \partial_x u(t, \boldsymbol{x}) = \frac{\nu}{\pi} \partial_{xx} u(t, \boldsymbol{x}) \]

where \( \boldsymbol{x} \) represents the spatial coordinate and \( t \) the temporal coordinate. The solution of the PDE is the function \( u\) which is sought or learned by the neural network.

Motivation

Despite recent advances in neural architectures for PDE solving, current methods lack several of the characteristics of an ideal PDE solver: (i) generalization to different Initial Conditions (ICs), (ii) PDE parameters, (iii) support for 1D, 2D, and 3D PDEs, (iv) stability over long rollouts, (v) temporal extrapolation, (vi) spatial and temporal super- resolution capabilities, all with affordable cost, high speed, and accuracy. Towards developing a model that encompasses these ideal characteristics, we propose Vectorized Conditional Neural Field (VCNeF), a linear transformer-based conditional neural field that solves PDEs continuously in time, endowing the model with temporal as well as spatial Zero-Shot Super-Resolution (ZSSR) capabilities. The model introduces a new mechanism to condition a neural field on Initial Conditions (ICs) and PDE parameters to achieve generalization to both ICs and PDE parameter values not seen during training. While modeling the solution using neural fields such as PINNs naturally provide temporal and spatial ZSSR, these methods are inefficient since we need to query them separately for every temporal and spatial location in the domain. We achieve faster training and inference by vectorizing these computations on GPUs. Moreover, the proposed method explicitly models dependencies between multiple simultaneous spatio-temporal queries to the model.

Background and Preliminaries

Neural Fields. In physics, a field is a quantity that is defined for all spatial and temporal coordinates. Neural Fields (NeFs) learn a function \( f \) which maps the spatial and temporal coordinates (i.e., \(\boldsymbol{x} \in \mathbb{R}^d \), \(t \in \mathbb{R}^+\) respectively) to a quantity \(q \in \mathbb{R}^c \). Mathematically, a neural field can be expressed as a function \[ f_\theta: (\mathbb{R}^+ \times \mathbb{R}^d) \rightarrow \mathbb{R}^c \text{ with } (t, \boldsymbol{x}) \mapsto q = u(t, \boldsymbol{x})\] that is parametrized by a neural network with parameters \( \theta \).

Conditional Neural Fields. Conditional Neural Fields (CNeFs) extend NeFs with a conditioning factor \( \boldsymbol{z} \in \mathbb{R}^n \) to influence the output of the neural field. The conditioning factor was originally introduced for computer vision to control the colors or shapes of objects that are being modeled. In contrast, we condition the neural field, which models the solution of the PDE, on the initial value or IC and the PDE parameters. Thus, the conditioning factor influences the entire field. This leads to the function \[ f_\theta: (\mathbb{R}^+ \times \mathbb{R}^d \times \mathbb{R}^n ) \rightarrow \mathbb{R}^c \text{ with } (t, \boldsymbol{x}; \boldsymbol{z}) \mapsto q = u(t, \boldsymbol{x}; \boldsymbol{z}) \] that is parametrized by a neural network with parameters \( \theta \) and \( \boldsymbol{z} \) influences the modeled function \( u \).

Neural Field

Conditional Neural Field

Method

Vectorized Conditional Neural Fields. Typically, a (conditional) neural field generates the output quantities for all input spatial and temporal coordinates in multiple and independent forward passes. The training and inference times can be improved by processing multiple inputs in parallel on the GPU, which is possible since all forward passes are independent. However, there are spatial dependencies between different input spatial coordinates, particularly for solving PDEs, that will not be exploited with CNeFs or by processing multiple inputs of CNeFs in parallel. Consequently, we propose extending CNeFs to

take a vector with arbitrary spatial coordinates of variable size (a set of query points) as input,
exploit the dependencies of the input coordinates when generating the outputs,
generate all outputs for the inputs in one forward pass.

Hence, we name our proposed model Vectorized Conditional Neural Field since it implicitly generates a vectorization of the input spatial coordinates for a given time \( t \). The VCNeF model represents a function \[ f_\theta: (\mathbb{R}_{+} \times \mathbb{R}^{s \times d}) \rightarrow \mathbb{R}^{s \times c} \\ \text{ with } (t, \boldsymbol{X}) \mapsto u(t, \boldsymbol{X}) = \begin{pmatrix} u(t, \boldsymbol{x_1}) \\ \vdots \\ u(t, \boldsymbol{x_s}) \end{pmatrix} \] where \( u(t, \boldsymbol{x_i}) \) denotes the PDE solution for the spatial coordinates \( \boldsymbol{x_i} \). Note that we do not impose a structure on the spatial coordinates \( \boldsymbol{x_i} \) and that the number of spatial points (i.e., \( s \) ) can be arbitrary. The model can process multiple timesteps \( t \) in parallel on the GPU to further improve the training and inference time since VCNeF does not exploit dependencies between the temporal coordinates.

VCNeF Properties. The design of the proposed VCNeF model allows for the following properties. (i) Generalization to PDE parameters not seen during training, (ii) spatial- and temporal zero-shot super-resolution (i.e., increasing the spatial and temporal resolution after training), (iii) accelerated training and inference due to the vectorization, and (iv) allows including a physics-aware loss function.

Neural Architecture. VCNeF consists of a Linear Transformer to produce an attention-refined representation of the initial condition and PDE parameters. In the modulation blocks, the previously computed attention-refined latent-representations are used to modulate a vectorized conditional neural field. The vectorized conditional neural field can be queried with temporal and spatial coordinates and uses a self-attention mechanism that allows leveraging spatial dependencies between the generated solution points. To reduce the computational costs, the spatial domains are divided into non-overlapping patches as in Vision Transformers (ViTs) for 2D and 3D PDEs. However, unlike a traditional ViT, our patch generation has two branches: patches of a smaller size (\(p_S = 4 \) or \(4 \times 4 \)) and of a larger size (\(p_S = 16 \) or \(16 \times 16 \)) since we aim to capture the dynamics accurately at multiple scales.

Experiments and Results

In our experiments, we focus on solving initial value problems for 1D Burgers', 1D Advection, and Compressible Navier-Stokes (CNS) equations from PDEBench. We choose neural operators, neural-field-based, and transformer-based models as baselines. Namely, we choose the Fourier Neural Operator (FNO), MP-PDE, CORAL, OFormer, and Galerkin Transformer as baselines. Except for CORAL, all models are trained in an autoregressive fashion.

Comparison to state-of-the-art Baselines for Fixed PDE Parameter Value and Resolutions

PDE	Model	nRMSE (↓)	bRMSE (↓)
1D Burgers'	FNO	0.0987	0.0225
	MP-PDE	0.3046 (+208.7%)	0.0725 (+221.7%)
	CORAL	0.2221 (+125.1%)	0.0515 (+128.2%)
	OFormer	0.1035 (+4.9%)	0.0215 (-4.5%)
	Galerkin Transformer	0.1651 (+67.3%)	0.0366 (+62.3%)
	VCNeF	0.0824 (-16.5%)	0.0228 (+1.3%)
1D Advection	FNO	0.0190	0.0239
	MP-PDE	0.0195 (+2.7%)	0.0283 (+18.4%)
	CORAL	0.0198 (+4.3%)	0.0127 (-46.8%)
	OFormer	0.0118 (-38.0%)	0.0073 (-69.6%)
	Galerkin Transformer	0.0621 (+227.1%)	0.0349 (+46.2%)
	VCNeF	0.0165 (-13.0%)	0.0088 (-63.2%)
1D Compressible Navier-Stokes	FNO	0.5722	1.9797
	CORAL	0.5993 (+4.7%)	1.5908 (-19.6%)
	OFormer	0.4415 (-22.9%)	2.0478 (+3.4%)
	Galerkin Transformer	0.7019 (+22.7%)	3.0143 (+52.3%)
	VCNeF	0.2943 (-48.6%)	1.3496 (-31.8%)
2D Compressible Navier-Stokes	FNO	0.5625	0.2332
	Galerkin Transformer	0.6702 (+19.2%)	0.8219 (+252.4%)
	VCNeF	0.1994 (-64.6%)	0.0904 (-61.2%)
3D Compressible Navier-Stokes	FNO	0.8138	6.0407
3D Compressible Navier-Stokes	VCNeF	0.7086 (-12.9%)	4.8922 (-19.0%)

Spatial Zero-Shot Super-Resolution
Trained on lower spatial resolutions and tested on higher spatial resolutions

PDE	Spatial res.	Model	nRMSE (↓)	bRMSE (↓)
1D CNS	256	FNO	0.5722	1.9797
		OFormer	0.4415	2.0478
		VCNeF	0.2943	1.3496
	512	FNO	0.6610	2.7683
		OFormer	0.4657	2.5618
		VCNeF	0.2943	1.3502
	1024	FNO	0.7320	3.5258
		OFormer	0.4655	2.5526
		VCNeF	0.2943	1.3510
3D Compressible Navier-Stokes	32 x 32 x 32	FNO	0.8138	6.0407
	32 x 32 x 32	VCNeF	0.7086	4.8922
	64 x 64 x 64	FNO	0.9452	8.7068
	64 x 64 x 64	VCNeF	0.7228	5.1495
	128 x 128 x 128	FNO	1.0077	9.8633
	128 x 128 x 128	VCNeF	0.7270	5.3208

Temporal Zero-Shot Super-Resolution
Trained on lower temporal resolutions and tested on higher temporal resolutions

PDE	Temporal res.	Model	nRMSE (↓)	bRMSE (↓)
1D CNS	41	FNO	0.5722	1.9797
		CORAL	0.5993	1.5908
		VCNeF	0.2943	1.3496
	82	FNO + Interp.	0.5667	1.9639
		CORAL	1.1524	3.7960
		VCNeF	0.2965	1.3741
3D CNS	11	FNO	0.8138	6.0407
	11	VCNeF	0.7086	4.8922
	21	FNO + Interp.	0.8099	6.1938
		VCNeF	0.7106	5.1446

Generalization to Unseen PDE Parameter Values
Trained on set of PDE parameter values and tested on unseen values (boldfaced)

The experiments demonstrate that VCNeF performs competitively with the baselines and often outperforms them. Furthermore, the model has proper zero-shot super-resolution capabilities in space and time as well as can generalize to unseen PDE parameter values.

Qualitative Results

Here, we compare visualizations of the predictions vs ground truth for 1D Burgers, Advection, and 2D Compressible Navier-Stokes PDEs. The 2D CNS dataset has four channels, namely density velocity-x and velocity-y, and pressure, and we visualize the predictions of our VCNeF model with the ground truth data.

Predictions of VCNeF and ground truth for 1D Burgers' with a spatial resolution of 256.

Predictions of VCNeF and ground truth for 1D Advection with a spatial resolution of 256.

Predictions of VCNeF and ground truth of the density channel for 2D CNS with a spatial resolution of 64 x 64.

Predictions of VCNeF and ground truth of the pressure channel for 2D CNS with a spatial resolution of 64 x 64.

Predictions of VCNeF and ground truth of the velocity channel for 2D CNS with a spatial resolution of 64 x 64.

BibTeX


@inproceedings{hagnberger2024vecnef,
  title={Vectorized Conditional Neural Fields: A Framework for Solving Time-dependent Parametric Partial Differential Equations}, 
  author={Jan Hagnberger and Marimuthu Kalimuthu and Daniel Musekamp and Mathias Niepert},
  year={2024},
  eprint={2406.03919},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}