(September 2021) Introduction to Polynomial Chaos Expansions

11 minute read

Published: September 25, 2021

These notes are still quite rough - I’m still editing them at the moment, and will update with a revised version in a couple days.

Table of Contents

A Short Protocol for Polynomial Chaos Expansion (PCE)
- Setting up a surrogate PCE model
- Using the PCE model for Bayesian inference
Background

A Short Protocol for Polynomial Chaos Expansion (PCE)

The following are methods to solve a forward problem: Monte Carlo integration, point collocation, pseudo-spectral projection, and intrusive Galerkin

The approach of PCE is to model a deterministic forward problem $u(q)$ by an expansion: $u(q) \approx \hat{u}(q) = \sum_{i=1}^{N} c_i P_i (\xi_i)$ Where, $q$ are parameters and $P_i$ are polynomials that are orthogonal relative to the probability density function of the random variable $\hat{u}(q)$. $c_i$ are called the Fourier coefficients.

Setting up a surrogate PCE model

Declare priors for parameters in model-space ($q_i$) and PCE-space ($\xi_i$)
Set up transform from model-space to PCE-space. For example,

Let $q_i \sim \text{Uniform}(a, b)$ and $\xi_i \sim \text{Uniform}(-1, 1)$. Then, $q_i = F^{-1}_i (T(\xi_i))$ where $F_{i}^{-1}$ is the inverse cumulative distribution function (cdf) of $q_i$, and $T$ is the cdf of $\xi_i$.

Construct an $N_P$-dimensional basis of orthogonal polynomials, where

$N_P = \frac{(N+P)!}{N! P!} - 1$ Where, $N$ is the total number of random variables $\xi_i$ and $P$ the maximum order of the polynomials.

Generate $M$ parameter sets sampled from the priors of $\mathbf{\xi}$, and arrange them into a $(M \times (N_P + 1))$-dimensional matrix $\mathbf{H}$.

\[\mathbf{H} = \begin{bmatrix} P_0 (\mathbf{\xi_1}) & P_1 (\mathbf{\xi_1}) & \cdots & P_{N_P} (\mathbf{\xi_1}) \\ P_0 (\mathbf{\xi_2}) & P_1 (\mathbf{\xi_2}) & \cdots & P_{N_P} (\mathbf{\xi_2}) \\ \vdots & \vdots & \ddots & \vdots \\ P_0 (\mathbf{\xi_M}) & P_1 (\mathbf{\xi_M}) & \cdots & P_{N_P} (\mathbf{\xi_M}) \\ \end{bmatrix}\]

Evaluate the PCE model at the selected points to yield a $M \times 1$ output vector

\[\mathbf{z} = [ z(t, \mathbf{\xi_1}), z(t, \mathbf{\xi_2}), \cdots z(t, \mathbf{\xi_M}) ]^T\]

Estimate the Fourier coefficients based on

\[\begin{aligned} \mathbf{z} &= \mathbf{H} \mathbf{c}, \\ \mathbf{c} &= [c_1(t), c_2(t), \cdots, c_{N_p}(t)]^T \end{aligned}\]

$\mathbf{c}$ can be found by minimizing the 2-norm: $|| \mathbf{z} - \mathbf{H}\mathbf{c} ||$

for which the least-squares solution is: $\mathbf{\hat{c}} = (\mathbf{H^T}\mathbf{H})^{-1}\mathbf{H^T} \mathbf{z}$

Using the PCE model for Bayesian inference

We are interested in estimating the posterior distribution $\pi(\mathbf{q} \mid \mathbf{d}) \propto \mathcal{L}(\mathbf{d} - \mathbf{z}(t; \mathbf{\xi})) \cdot \pi_0(\mathbf{q}) \\$

Where, $\mathcal{L}$ is the likelihood, $\mathbf{d}$ is the data, $\mathbf{q}$ are parameters in model-space, and $\pi_0$ are prior distributions, also in model-space.

Set model-space priors $\pi_0(\mathbf{q})$
Construct PCE model $\mathbf{z}(\mathbf{\xi})$ over selected parameter space.
Draw samples $\mathbf{ (q^{k}) }^{K}_{k=1}$ from the priors;

for k = 1:K
- Evaluate prior and likelihood
- For importance sampling:
  - Evaluate posterior/prior ratios for each sample, then
  - Normalize resulting weights

Transform samples $\mathbf{xi}$ back to model-space to recover posterior distributions.
Find MAP estimates by: $\hat{q}_{\text{MAP}} = \argmin_\mathbf{q} (-\pi(\mathbf{q} \mid \mathbf{d}))$

Background

Hilbert Space and Inner Products

PCE is a Hilbert space technique equivalent to Fourier series expansions for periodic signals. Hilbert space is a generalization of Euclidean spaces to spaces with possibly infinite dimensions. Hilbert spaces have a defined inner product operator, which thus defines distances between objects in that space, as well as orthogonality (i.e. if the inner product between objects is zero). The dot product gives both length and angles:

\[\mathbf{x} \cdot \mathbf{y} = ||\mathbf{x}|| \ ||\mathbf{y}|| \cos \theta\]

More generally, a vector space that has a qualifying inner product is called an inner product space. Every such space is also a Hilbert space. The dot product is one type of inner product, but more are possible. The criteria for inner products are:

the operation is commutative, 
linear, 
positive definite 

Multivariate Polynomial Basis

Consider the following PCE expansion for random output $Y = M(\mathbf{X})$,

\[Y = M(\mathbf{X}) \approx \sum_{i \in R^m} c_i \Phi_i (\mathbf{X})\]

where

$Y$ are the observed data,
$M$ is our model,
$\mathbf{X}$ are model parameters (aka ‘input variables’)
$m$ is the dimensionality of $\mathbf{X}$
$\Phi_i$ is a PCE basis, comprised of multivariate orthogonal/orthonormal polynomials
$c_i$ are Fourier coefficients/coordinates

The prior probability density functions (PDFs) of (random) input parameters $\mathbf{X}$ is given by

\[f_\mathbf{X} (\mathbf{x}) = \prod_{i-1}^{m} f_{X_i}(x_i)\]

where $\mathbf{x}$ is a single realization (draw/sample).

A basis of orthogonal polynomials satisfies the property that the inner product of any two polynomials in the basis is zero. In the univariate case, let $P_{k}^i, \ k \in \mathcal{N}$ be the $k$-th polynomial component for the input variable $X_i$. Note $\mathcal{N}$ is used here (and below) to denote the set of positive integers, or ‘natural numbers.’ Then,

\[\langle P^{i}_j , P^{i}_k \rangle = \int P^{i}_j (u) P^{i}_k (u) f_{X_i}(u) du = \gamma_{j}^{i} \delta_{jk}\]

When $X_i \sim \text{Uniform}(-1, 1)$, $P_i$ are called Legendre polynomials, and Hermite polynomials if $X_i \sim \text{Normal}(0, 1)$.

An orthonormal basis can be constructed by normalizing the $P^{i}_j$:

\[\begin{aligned} \Phi_{j}^{i} &= \frac{P^{i}_j}{\sqrt{\gamma^{i}_j}} \\ i = 1, \dots &m, \quad j \in \mathcal{N} \end{aligned}\]

Tensor product construction

The above tells us how to construct a basis for a single input parameter, but how do we construct a basis for the full set of $\mathbf{X}$?

To do this, we define a multivariate polynomial as the product of univariate polynomials corresponding to each input parameter $x_i$.

\[\Phi_\alpha (\mathbf{x}) := \prod_{i-1}^{m} \Phi_{\alpha}^{i}(x_\alpha) \quad E[\Phi_\alpha (\mathbf{X}) \Phi_\beta (\mathbf{X})] = \delta_{\alpha \beta}\]

where, $\alpha = (\alpha_1, \dots \alpha_m)$ are multi-indices (degree of the polynomial in each variable, i.e. partial degree in each dimension).

Computing PCE coefficients

PCEs can be written as the sum of a truncated series and a residual term:

\[Y = M(\mathbf{X}) = \mathbf{C^T}\mathbf{\Phi}(\mathbf{X}) + \epsilon \mu (\mathbf{X})\]

where:

$\mathbf{C} = { c_\alpha , \ \alpha \in \mathcal{A} } = { c_0, \dots c_{P-1} }$ contains the $P$ unknown coefficients. See below for a definition of $\mathcal{A}$.
$\mathbf{\Phi}(x) = { \Phi_0 (x) , \dots \Phi_{P-1} (x) }$ holds the $P$ corresponding polynomial bases

Procedure for PCE analysis

The idea is to find coefficients $\mathbf{\hat{C}}$ that the minimizes the difference between the outputs of the true model $M$ and a (sparse) PCE model.

Select a ‘truncation scheme’ - that is, after computing a ‘full order’ model (see equation for ‘cardinality’), we only keep a subset of terms to obtain a ‘sparse’ PCE.

A common truncation scheme is to set some maximal order. For instance, with an $m$-dimensional input space and maximum polynomial order $p$, we can write:

\[\mathcal{A}^{m, \ p} = \{ \alpha \in \mathcal{N}^m \ : \ |\alpha|_1 \leq p \}\]

Simulate the model response for $n$ parameter sets sampled from the input space.

\[\mathbf{M} = \{ M(x^1), \dots M(x^n )\}^T\]

Construct and evaluate the polynomial basis over the selected parameters and hold the results in an “experimental matrix” $\mathbf{A}$:

\[\begin{aligned} \mathbf{A}_ij &= \Phi_{j} (x^i) \\ i &= 1, \dots n \\ j &= 0, \dots P-1 \end{aligned}\]

Finally, we solve the linear system for coefficients: $\mathbf{\hat{C}} = (\mathbf{A^T}\mathbf{A})^{-1}\mathbf{A^T}\mathbf{M}$

Evaluating PCE accuracy

We also want to optimize the polynomial order, or ‘truncation scheme.’ We could do this by:

Minimizing the MSE (on a different set of data) between the true model and PCEs of different orders, but this would require running the true model many times, which is computationally intensive.
Or, we can just re-use the same data used for calibration (i.e. compute the empirical error), but this would result in overfitting.

Alternatively, we can use Leave-one-out (LOO) cross-validation:

Select an ‘experimental design’ $\mathcal{X} = { x^j ,\ j=1, \dots n}$, i.e. predictions of the true model with a low-discrepancy set as input
Build PCEs using all points but one, e.g. leaving out the $i$-th point: $\mathcal{X}_{i \neq j}$.
Compute the error at the held-out $i$-th point, which is the local error at that point.
Repeat this for all points $x^i$.

To get the LOO error without re-computing $n$ PCEs (for each validation point), we can instead run a single computation via a corrected LOO error:

\[E_{LOO} = \frac{1}{n} \sum^{n}_{i-1} \left( \frac{M(x^i) - M^{PC}(x^i)}{1 - h_i} \right)^2\]

Where, $h_i$ is the $i$-th diagonal term of the matrix $\mathbf{A}(\mathbf{A^T}\mathbf{A})^{-1}\mathbf{A^T}$

This effectively bypasses the last step of LOO. That is, we get the same LOO error after just a single PCE analysis (i.e. select a truncation order, construct a PCE basis, fit Fourier coefficients).

Curse of dimensionality

For $m$ model parameters and polynomial order $p$, the cardinality of the truncation scheme (number of terms) is:

\[\text{cardinality} (\mathcal{A}^{m, \ p}) := P = \frac{(m + p)!}{m! \ p!}\]

Compressive sensing approaches

It turns out that most coefficients are close to zero. We can re-write the LOO error to add a regularization term $+ \lambda

c_\alpha

$ which penalizes greater values of the coefficients, and therefore encourages sparsity in the resulting coefficients $\mathbf{C}$, and therefore the PCE.

Least Angle Regression (LAR)

Least angle regression is one approach to perform this regularized minimization, and does so by solving the LASSO problem for different values of $\lambda$ simultaneously. A Python implementation in scikit-learn is available.

LAR is an algorithm that finds coefficients for linear regressions:

Initially, all coefficients are set to zero.
For the coefficient whose value is most correlated to the norm of the residual, move it in the direction of correlation until the correlation of some other coefficient catches up (i.e. since the correlation of the first coefficient will decrease as it approaches its least-square value).
Extend (i.e. modify the value of) the two coefficients in the direction that is equiangular to (i.e. ‘in the middle’ between) their individual correlations.
Repeat for all coefficients.

Sobol decomposition for PCE

General Sobol method

The Sobol method looks at the contribution of individual parameters $x_i \in \mathbf{X}$ to the total variance of the model output $M(\mathbf{X})$. The central quantity are the Sobol indices, or first-order effects, which describe the partial variance due to a single parameter. The $u$-th first-order Sobol index is given by:

\[\begin{aligned} S_u &= \frac{D_u}{D} = \frac{ Var[M_{\mathbf{u}} (\mathbf{X_{\mathbf{u}}})] }{ Var[M (\mathbf{X})] } \\ \mathbf{u} &= \{i_1 , \dots i_s \}, \quad 1 \leq s \leq m \\ \mathbf{u} &\subset \{1, \dots m \} \ \cap \ \mathbf{X_\mathbf{u}} \\ \mathbf{X_\mathbf{u}} &= \{ X_{i_1}, \dots X_{i_n} \} \end{aligned}\]

Where $D_u$ and $D$ are the partial and total variances, respectively. The so-called ‘total’ order Sobol index of the $u$-th parameter is given by one minus the partial variance due to all parameters besides the $u$-th parameter.

Sobol method for PCEs

For PCE, the first-order Sobol index for the $i$-th parameter can be computed as the sum of products between all of the $i$-th coefficients and polynomial bases.

\[\sum_{\alpha \in \mathcal{A}_u} c_\alpha \Phi_\alpha (\mathbf{X}) \\ \mathcal{A}_u = \{ \alpha \in \mathcal{A} \mid \alpha_k \neq 0 , \ k \in \mathbf{u} \}\]

The formula for total-order indices is analogous to the general method: we sum products for all but the $i$-th terms. $\widetilde{\mathcal{A}}_u = \{ \alpha \in \mathcal{A} \mid \alpha_i = 0 \ \forall \ i \neq \mathbf{u} \}$

Share on

Twitter Facebook LinkedIn

Delbert Yip