Appendix D — Mechanistically Motivated Dynamic Spatio-Temporal Models
As discussed in Section 5.3, it can be quite useful to parameterize DSTMs by considering transition matrices that are motivated by a mechanistic model. Here, we show the details of how one can do this with a partial differential equation (PDE) and an integro-difference equation (IDE).
D.1 Example of a Process Model Motivated by a PDE: Finite Differences
Consider the case where the parameters \(a\), \(b\), \(u\), and \(v\) in Equation 5.14 vary with space and denote the two-dimensional spatial location by the vector \(\mathbf{s}= (x,y)'\). Then the PDE is
\[ \small \frac{\partial Y}{\partial t} = \frac{\partial}{\partial x}\left(a(x,y) \frac{\partial Y}{\partial x}\right) + \frac{\partial}{\partial y}\left(b(x,y) \frac{\partial Y}{\partial y}\right) + u(x,y) \frac{\partial Y}{\partial x} + v(x,y) \frac{\partial Y}{\partial y}. \tag{D.1}\]
If we consider this process on a regular two-dimensional grid and employ a standard centered finite difference in space and a forward difference in time for Equation D.1, we obtain a lagged nearest-neighbor relationship given by
\[ \small \begin{aligned} Y_t(x,y) =~~ &\theta_{p,1}(x,y) Y_{t-\Delta_t}(x,y) + \theta_{p,2}(x,y) Y_{t-\Delta_t}(x+\Delta_x,y) \nonumber \\ &\; + \; \theta_{p,3}(x,y) Y_{t-\Delta_t}(x-\Delta_x,y) + \theta_{p,4}(x,y) Y_{t-\Delta_t}(x,y + \Delta_y) \nonumber \\ &\; + \; \theta_{p,5}(x,y) Y_{t-\Delta_t}(x,y-\Delta_y), \end{aligned} \tag{D.2}\]
where \(\Delta_t\) is a time-discretization constant, \(\Delta_x\) and \(\Delta_y\) are spatial-discretization constants, and the \(\theta\)s are defined as
\[\begin{align*} \small \theta_{p,1}(x,y) &= \left[\frac{-2 a(x,y) \Delta_t}{\Delta^2_x} + \frac{-2 b(x,y) \Delta_t}{\Delta^2_y}\right] + 1, \\ \theta_{p,2}(x,y) &= \frac{a(x+\Delta_x,y) \Delta_t}{4 \Delta_x^2} - \frac{a(x-\Delta_x,y) \Delta_t}{4 \Delta_x^2} + \frac{a(x,y) \Delta_t}{ \Delta_x^2} + \frac{u(x,y) \Delta_t}{2 \Delta_x}, \\ \theta_{p,3}(x,y) &= \frac{-a(x+\Delta_x,y) \Delta_t}{4 \Delta_x^2} + \frac{a(x-\Delta_x,y) \Delta_t}{4 \Delta_x^2} + \frac{a(x,y) \Delta_t}{ \Delta_x^2} - \frac{u(x,y) \Delta_t}{2 \Delta_x}, \\ \theta_{p,4}(x,y) &= \frac{b(x,y+\Delta_y) \Delta_t}{4 \Delta_y^2} - \frac{b(x,y-\Delta_y) \Delta_t}{4 \Delta_y^2} + \frac{b(x,y) \Delta_t}{ \Delta_y^2} + \frac{v(x,y) \Delta_t}{2 \Delta_y}, \\ \theta_{p,5}(x,y) &= \frac{-b(x,y+\Delta_y) \Delta_t}{4 \Delta_y^2} + \frac{b(x,y-\Delta_y) \Delta_t}{4 \Delta_y^2} + \frac{b(x,y) \Delta_t}{ \Delta_y^2} - \frac{v(x,y) \Delta_t}{2 \Delta_y}. \end{align*}\]
Thus, we see that the finite differences suggest that the neighbors of location \((x,y)\) at the previous time (i.e., locations \((x-\Delta_x,y)\), \((x+\Delta_x,y)\), \((x,y-\Delta_y)\), and \((x, y + \Delta_y)\)), as well as the location \((x,y)\) itself, play a role in the transition from one time to the next. Note the role of the spatially varying parameters. Let \(\mathbf{Y}_t\) be the process evaluated at all interior grid points at time \(t\), and assume the process is defined to be 0 on the boundary (for ease of presentation). Then one can write Equation D.2 as \(\mathbf{Y}_t = \mathbf{M}\mathbf{Y}_{t-\Delta_t}\), where \(\mathbf{M}\) is parameterized with the elements of \(\{\theta_{p,i}(x,y),\ i=1,\ldots,5\}\). Assume first for simplicity that there is no advection (i.e., \(u(x,y) = 0\), \(v(x,y) = 0\) for all locations) and the diffusion coefficients in the \(x\)- and \(y\)-directions are equal (i.e., \(a(x,y) = b(x,y)\)). Then, it can be shown that the transition operator \(\mathbf{M}\) is still asymmetric if the diffusion coefficients vary with space. If the diffusion coefficients are constant in space and equal (i.e., \(a=b\)), then transition-operator asymmetry is only due to the advection component.
The type of diffusion represented in Equation D.1 is typically called “Fickian” diffusion. Similar finite-difference discretizations of other diffusion representations (e.g., so-called “ecological diffusion,” \(\nabla^2 (a(x,y) Y)\)) lead to different formulations of the parameters \(\boldsymbol{\theta}_p\) (in terms of the coefficients \(a(x,y)\)), but they still correspond to a five-diagonal sparse transition operator \(\mathbf{M}\). Thus, in the context of a linear DSTM process model, we typically allow the parameters \(\boldsymbol{\theta}_p\) to be spatially explicit random processes with the possible addition of covariates, rather than model the specific diffusion equation coefficients (e.g., \(a(x,y)\), \(b(x,y)\), \(u(x,y)\), and \(v(x,y)\) in Equation D.1) directly. (Although, one can certainly do this, and the different diffusions correspond to different scientific interpretations.) A last point to make is that different types of finite-difference discretizations lead to different parameterizations (e.g., a higher-order spatial discretization leads to larger neighborhoods, and higher-order temporal differences lead to higher-order Markov models.)
D.2 Example of a Process Model Motivated by a PDE: Spectral
This section considers a natural basis-function (i.e., spectral) approach to motivating a DSTM from a mechanistic model.
Consider a simple one-dimensional spatial version of the advection-diffusion PDE in Equation 5.14, and denote the spatial index by \(x\),
\[ \frac{\partial Y(x,t)}{\partial t} = a \frac{\partial^2 Y(x,t)}{\partial x^2} + b \frac{\partial Y(x,t)}{\partial x}, \tag{D.3}\]
where in this example the advection and diffusion coefficients (\(b\) and \(a\), respectively) are assumed to be constant. Now consider the solution as a superposition of Fourier functions (i.e., sines and cosines),
\[ Y_t(x) = \sum_{j=1}^J [\alpha_{j,t}(1) \cos(\omega_j x) +\alpha_{j,t}(2) \sin(\omega_j x)], \tag{D.4}\]
where \(\omega_j = 2 \pi j/\vert D_x\vert\) is the spatial frequency of a sinusoid with spatial wave number \(j=1,\dots ,J\) in the spatial domain \({D_x}\) (and \(|D_x|\) corresponds to the length of the spatial domain). For simplicity of exposition, we do not include a constant term in the expansion Equation D.4. For the \(n\) spatial locations of interest and \(n_\alpha\) Fourier basis functions, we let \(\mathbf{Y}_t = \boldsymbol{\Phi}\boldsymbol{\alpha}_t\), where \(\mathbf{Y}_t\) is an \(n\)-dimensional vector, \(\boldsymbol{\Phi}\) is an \(n \times n_\alpha\) matrix consisting of Fourier basis functions, and \(\boldsymbol{\alpha}_t\) contains the associated \(n_\alpha\) expansion coefficients, where \(n_\alpha = 2 J\).
The deterministic solution of Equation D.3 gives formulas for \(\{\alpha_{j,t}(1), \alpha_{j,t}(2)\}\), which are exponentially decaying sinusoids in time:
\[ \begin{aligned} \alpha_{j,t}(1) & = \exp(-a \omega_j^2 t) \sin(b \omega_j t), \\ \alpha_{j,t}(2) & = \exp(-a \omega_j^2 t) \cos(b \omega_j t),\quad j=1,\dots ,J. \end{aligned} \]
In this case, the time evolution is given by,
\[ {\boldsymbol{\alpha}}_{j,t+\Delta_t} = {\mathbf{M}}_j {\boldsymbol{\alpha}}_{j,t},\quad j=1,\dots ,J, \]
where \(\boldsymbol{\alpha}_{j,t} \equiv (\alpha_{j,t}(1) \; \alpha_{j,t}(2))'\) and
\[ {\mathbf{M}}_j = \left[ \begin{array}{rr} e^{-a \omega_j^2 \Delta_t}\cos \{\omega_j \Delta_t\} & e^{-a \omega_j^2 \Delta_t}\sin\{\omega_j \Delta_t\} \\ -e^{-a \omega_j^2 \Delta_t}\sin\{\omega_j \Delta_t\} & e^{-a \omega_j^2 \Delta_t}\cos\{\omega_j \Delta_t\} \end{array} \right]. \]
This motivates the \(2J\)-dimensional linear DSTM process model,
\[ {\boldsymbol{\alpha}}_{t} = \mathbf{M}_\alpha {\boldsymbol{\alpha}}_{t-1} + {\boldsymbol{\eta}}_{t}\,, \]
where \({\boldsymbol{\alpha}}_t \equiv ({\boldsymbol{\alpha}}'_{1,t} \; \ldots \; {\boldsymbol{\alpha}}'_{J,t})'\) for \(\boldsymbol{\alpha}_{j,t} = (\alpha_{j,t}(1),a_{j,t}(2))'\), \(\boldsymbol{\eta}_t = (\boldsymbol{\eta}'_{1,t},\ldots,\boldsymbol{\eta}'_{J,t})'\) for \(\boldsymbol{\eta}_{j,t} = (\eta_{j,t}(1),\eta_{j,t}(2))'\), \(\mathbf{M}_\alpha\) is a \(2J \times 2J\) block diagonal matrix with blocks \(\{{\mathbf{M}}_j\), \(j=1,\ldots,J\}\), and we have assumed that \(\Delta_t=1\). This then suggests block-diagonal parameterizations where the \(2 \times 2\) coefficients associated with each set of Fourier functions are unknown and must be estimated (e.g., via a Bayesian hierarchical model). The result is a very sparse representation for the transition matrix, \(\mathbf{M}_\alpha\), when \(J\) is large.
D.3 Example of a Process Model Motivated by an IDE
The stochastic IDE framework discussed in Chapter 5 naturally motivates a DSTM process model. Consider a decomposition similar to Equation 5.24, where we let the process \(\{ \widetilde{Y}_t(\mathbf{s}): \mathbf{s}\in D_s\}\) be decomposed as
\[ \widetilde{Y}_t(\mathbf{s})=\mathbf{x}_t(\mathbf{s})' \boldsymbol{\beta}+ Y_t(\mathbf{s})+\nu_t(\mathbf{s})\,, \]
where \(\{Y_t(\mathbf{s}): \mathbf{s}\in D_s\}\) is assumed to be a dynamical process, and \(\nu_t (\mathbf{s})\) is a non-dynamical process in the sense that it does not exhibit Markovian temporal dependence. Now, we assume that \(\{ Y_t(\mathbf{s})\}\) follows a stochastic IDE model as in Equation 5.9. That is, for \(\mathbf{s}\in D_s\),
\[ Y_t(\mathbf{s}) = \int_{D_s} m(\mathbf{s},\mathbf{x}; \boldsymbol{\theta}_p) \; Y_{t-1}(\mathbf{x}) \textrm{d}\mathbf{x}+\eta_t (\mathbf{s})\,, \tag{D.5}\]
where \(m(\mathbf{s},\mathbf{x}; \boldsymbol{\theta}_p)\) is the transition kernel over the domain \(D_s\), and \(\boldsymbol{\theta}_p\) are kernel parameters.
As in Equation 5.15, we assume that the dynamical process can be expanded in terms of \(n_\alpha\) basis functions, \(\{\phi_i (\mathbf{s})\colon i=1,\dots ,n_\alpha\}\). That is,
\[ Y_t (\mathbf{s}) = \sum^{n_\alpha}_{i=1} \phi_i(\mathbf{s}) \alpha_{i,t} \,. \tag{D.6}\]
Now, we can also expand the transition kernel in terms of these basis functions (although we could use different basis functions in general; see, for example, Cressie & Wikle (2011), Chapter 7):
\[ m(\mathbf{s},\mathbf{x}; \boldsymbol{\theta}_p) = \sum^{n_\alpha}_{j=1} \phi_j(\mathbf{x}) b_j(\mathbf{s}; \boldsymbol{\theta}_p). \tag{D.7}\]
Substituting Equation D.6 and Equation D.7 into Equation D.5 and, for the sake of simplicity, adding the assumption that the basis functions are orthonormal,
\[ \int_{D_s}\phi_i (\mathbf{x})\phi_j(\mathbf{x}) \textrm{d}\mathbf{x}= \left\{ \begin{array}{ll} 1,&i=j,\\ 0,&i\not= j, \end{array}\right. \]
we can show that
\[ \begin{aligned} Y_{t}(\mathbf{s}) &= \sum^{n_\alpha}_{i=1} b_i(\mathbf{s}; \boldsymbol{\theta}_p)\alpha_{i,t-1} + \eta_t (\mathbf{s})\\ &= \; \mathbf{b}'(\mathbf{s}; \boldsymbol{\theta}_p)\boldsymbol{\alpha}_{t-1} +\eta_t(\mathbf{s})\,, \end{aligned} \]
where \(\mathbf{b}(\mathbf{s}; \boldsymbol{\theta}_p)\equiv (b_1(\mathbf{s}; \boldsymbol{\theta}_p),\dots ,b_{n_\alpha}(\mathbf{s}; \boldsymbol{\theta}_p))'\) and \(\boldsymbol{\alpha}_t \equiv (\alpha_{1,t},\ldots ,\alpha_{n_\alpha,t})'\). Note also that \(\mathbf{Y}_t = \boldsymbol{\Phi}\boldsymbol{\alpha}_t\), where \(\boldsymbol{\Phi}\) is an \(n \times n_\alpha\) basis-function matrix,
\[ \boldsymbol{\Phi}\equiv\left(\begin{array}{c} \boldsymbol{\phi}(\mathbf{s}_1)'\\ \vdots\\ \boldsymbol{\phi}(\mathbf{s}_n)'\end{array}\right)\,, \]
and \(\boldsymbol{\phi}(\mathbf{s}_i) \equiv (\phi_1(\mathbf{s}_i),\dots ,\phi_{n_\alpha}(\mathbf{s}_i))'\), for \(i=1,\dots ,n\). Now, define the \(n\times n_\alpha\) matrix
\[ \mathbf{B}\equiv \left(\begin{array}{c} \mathbf{b}(\mathbf{s}_1;\boldsymbol{\theta}_p)'\\ \vdots\\ \mathbf{b}(\mathbf{s}_n;\boldsymbol{\theta}_p)'\end{array}\right)\,. \]
Then, for all \(n\) process locations, we can write
\[ \begin{aligned} \boldsymbol{\alpha}_t &= (\boldsymbol{\Phi}'\boldsymbol{\Phi})^{-1}\boldsymbol{\Phi}'\mathbf{B}\boldsymbol{\alpha}_{t-1} + (\boldsymbol{\Phi}'\boldsymbol{\Phi})^{-1}\boldsymbol{\Phi}'\boldsymbol{\eta}_t \\ &= \boldsymbol{\Phi}'\mathbf{B}\boldsymbol{\alpha}_{t-1} + \boldsymbol{\Phi}'\boldsymbol{\eta}_t\\ &\equiv \mathbf{M}_\alpha\boldsymbol{\alpha}_{t-1} +\tilde{\boldsymbol{\eta}}_t, \end{aligned} \]
where the second equality is due to orthonormality (i.e., \(\boldsymbol{\Phi}' \boldsymbol{\Phi}= \mathbf{I}\)), \(\tilde{\boldsymbol{\eta}}_t \equiv \boldsymbol{\Phi}' {\boldsymbol{\eta}}_t\) is the \(n_\alpha\)-dimensional noise process, and the \(n_\alpha \times n_\alpha\) propagator matrix is given by \(\mathbf{M}_\alpha \equiv \boldsymbol{\Phi}'\mathbf{B}\).
The truncated expansion Equation D.6 leads to a lower-dimensional dynamical process (since \(n_\alpha \ll n\)). In principle, we still have to estimate the \(n \times n_\alpha\) matrix \(\mathbf{B}\) and the covariance matrix associated with \(\tilde{\boldsymbol{\eta}}_t\). However, the IDE formulation allows the kernel \(m(\mathbf{s},\mathbf{x}; \boldsymbol{\theta}_p)\) to be parameterized parsimoniously. In some cases, one can select \(\{\phi_j(\mathbf{s})\}\) to ensure that the expansion coefficients for the kernel can be specified analytically in terms of its parameters. For example, letting \(\{\phi_j(\cdot)\}\) in Equation D.7 be Fourier basis functions allows one to parameterize the kernel in terms of its characteristic function. This can facilitate a BHM parameterization that allows kernel asymmetry and scale parameters to vary in space.
Lab 5.1 gives an introduction to the implementation of the IDE in one-dimensional space. Lab 5.2 then gives an implementation of a DSTM motivated by the stochastic IDE model to generate nowcasts for weather radar images.