Pergimus (Epilogue)

These are our last words in the book, but that doesn’t mean this is the end. This epilogical chapter’s title is meant to convey an open-endedness to our project. Our Latin sources tell us that pergimus means “let’s go forward” or “let’s continue to progress,” derived from the verb pergere. Here’s an opportunity for you, the reader, to move beyond the previous six chapters and develop your own statistical approaches to the analysis of spatio-temporal data. You now have a sense for the motivations, main concepts, and practicalities behind spatio-temporal statistics, and the R Labs have given you an important “hands-on” perspective.

We hope you’ve seen enough to want more than what is in our book. A stepping-off point for more theory and methods might be in the pages of Chapters 6–9 of Cressie & Wikle (2011); and you can find more and more applications in the literature, most recently where the spatio-temporal models fitted are non-Gaussian, nonlinear, and multivariate. We expect that by the time our book comes out, new applications and software for spatio-temporal statistics will have appeared, and we hope you’ll be motivated yourself to contribute.

We’ve tried to emphasize that spatio-temporal data are ubiquitous in the real, complex, messy world, and making sense of them depends on accounting for spatio-temporal dependencies. In the past, it’s been difficult to handle the complexity of such data, the hidden processes behind them, and the sheer size of many of the data sets. Yet the principles of good statistical practice still apply – they’re just a bit more involved! We should still explore our data through visualization and quantitative summaries; we should still try to build parsimonious models; we should add complexity to our models only when necessary; and we still need to evaluate our inferences through simulation and (cross-)validation. Then, after making all necessary modifications to the model, we go through the modeling–evaluation cycle again!

There are several challenges that are particular to spatio-temporal statistics. The obvious one is how to accommodate the complex dependencies that are typically present in spatio-temporal data. This is often exacerbated by the curse of dimensionality – that is, we may have a lot of data and/or are interested in predicting at a lot of locations in space and time. It’s often worse when we’re data-rich in one of the dimensions (e.g., space) but data-poor in the other (e.g., time, or vice versa). These challenges can be met by focusing on parsimonious parameterizations, for example when parameterizing spatio-temporal covariance functions in the descriptive approach or propagator matrices in the dynamic approach. In the latter case, using mechanistic processes to motivate parsimonious dynamic models has proven very useful.

In both cases, a very effective strategy is to treat scientifically interpretable parameters as random processes (e.g., spatial stochastic processes) at a lower level in a hierarchical statistical model. We’ve also seen that if we’re not careful about how our models are parameterized, we can run into serious computational roadblocks. One of the most helpful solutions comes through basis-function expansions, where the modeling effort is typically redirected towards specifying multivariate-time-series models for the random coefficients of the basis functions.

Finally, we’ve presented some approaches to model evaluation (checking, validation, and selection) for models fitted to spatio-temporal data. However, this is very much an open area of research, and there’s no “one way” to go about it. Nor should there be: using the analogy of a medical professional trying to evaluate a patient’s health status, such evaluation comes from running a battery of diagnostics.

We’ve entered an interesting time where statistical applications are increasingly using machine-learning methods to answer all sorts of questions. All the rage at the time of writing are “deep learning” methods based on deep models, which are quite complicated but, as noted earlier, essentially hierarchical. Statistical and machine-learning versions of these models share many things in common, such as requiring a lot of training data and prior information, substantial regularization (smoothing), and high-performance computing. The biggest difference to date is that machine-learning methods don’t always provide estimates of uncertainty or account for uncertainties in inputs and outputs. In the near future, we expect there will be substantially more cross-fertilization between these two paradigms, leading to new avenues of research and development in spatio-temporal modeling. This is an interesting and exciting place to be, at the intersection of statistics and the data-oriented disciplines in science, technology, engineering, and mathematics (STEM) that loosely define “data science.”

We believe that the statistical methods presented in this book provide a good practical foundation for much of spatio-temporal statistics, although there are many things that we didn’t cover – not because they are less important, but mainly because of space and time limitations (pun intended!). For example, some of the topics on our “should’ve but didn’t” list are:

It’s time to take a break, but let’s continue to progress… and we invite you to share your progress and check up on ours through the book’s website: https://spacetimewithr.org.