Appendix A — Some Useful Matrix-Algebra Definitions and Properties
For the sake of completeness, we provide some definitions and properties of vectors and matrices that are needed to understand many of the formulas and equations in this book. Readers who are already familiar with matrix algebra can skip this section. Readers who would like more detail than the bare minimum presented here can find them in books on matrix algebra or multivariate statistics (e.g., Johnson & Wichern (1992), Schott (2017)).
Vectors and matrices. In this book we denote a vector (a column of numbers) by a bold letter (Latin or Greek); for example,
\[ {\mathbf{a}} = \left[\begin{array}{c} a_1 \\ a_2\\ \vdots \\ a_p \end{array} \right] \]
represents a \(p\)-dimensional vector, and \({\mathbf{a}}' = [a_1, a_2, \ldots, a_p]\) or \((a_1, a_2,\ldots,a_p)\) is its \(p\)-dimensional transpose.
We also denote a matrix (an array of numbers) by bold upper-case letters (Latin or Greek); for example,
\[ {\mathbf{A}} = \left[ \begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \; & \vdots \\ a_{p1} & a_{p2} & \cdots & a_{pn} \\ \end{array}\right] \]
is a \(p \times n\) matrix, and \(a_{k \ell}\) corresponds to the element in the \(k\)th row and \(\ell\)th column; sometimes it is also written as \(\{a_{k \ell}\}\). The matrix transpose, \(\mathbf{A}'\), is then an \(n \times p\) matrix given by
\[ {\mathbf{A}}' = \left[ \begin{array}{cccc} a_{11} & a_{21} & \cdots & a_{p1} \\ a_{12} & a_{22} & \cdots & a_{p2} \\ \vdots & \vdots & \; & \vdots \\ a_{1n} & a_{2n} & \cdots & a_{pn} \\ \end{array}\right]. \]
We often consider a special matrix known as the identity matrix, denoted \(\mathbf{I}_n\), which is an \(n \times n\) diagonal matrix with ones along the main diagonal (i.e., \(a_{ii}=1\) for \(i=1,\ldots,n\)) and zeros for all of the off-diagonal elements (i.e., \(a_{ij} = 0, \mbox{for } i \neq j\)). It is sometimes the case that the dimensional subscript (in this case, \(n\)) is left off if the context is clear.
Finally, note that a vector can be thought of as a special case of a \(p \times n\) matrix, where either \(p=1\) or \(n=1\).
Matrix addition. Matrix addition is defined for two matrices that have the same dimension. Then, given \(p \times n\) matrices \(\mathbf{A}\) and \(\mathbf{B}\), with elements \(\{a_{k \ell}\}\) and \(\{b_{k \ell}\}\) for \(k=1,\ldots,p\) and \(\ell = 1,\ldots,n\), respectively, the elements of the matrix sum, \(\mathbf{C}= \{c_{k \ell}\} = \mathbf{A}+ \mathbf{B}\), are given by
\[ c_{k \ell} = a_{k \ell} + b_{k \ell}, \quad k=1,\ldots,p;\ \ell = 1,\ldots,n. \]
Scalar multiplication. Consider an arbitrary scalar, \(c\), and the \(p \times n\) matrix \(\mathbf{A}\). Scalar multiplication by a matrix then gives a new matrix in which each element of the matrix \(\mathbf{A}\) is multiplied individually by the scalar \(c\). Specifically, \(c \mathbf{A}= \mathbf{A}c = \mathbf{G}\), where each element of \(\mathbf{G}= \{g_{k \ell}\}\) is given by \(g_{k \ell} = c a_{k \ell}\), for \(k=1,\ldots,p\) and \(\ell = 1,\ldots,n\).
Matrix subtraction. As with matrix addition, matrix subtraction is defined for two matrices that have the same dimension. Consider the two \(p \times n\) matrices \(\mathbf{A}\) and \(\mathbf{B}\), with elements \(\{a_{k \ell}\}\) and \(\{b_{k \ell}\}\), for \(k=1,\ldots,p\) and \(\ell = 1,\ldots,n\), respectively. The matrix difference between \(\mathbf{A}\) and \(\mathbf{B}\) is then given by \[ \mathbf{C}= \{c_{k \ell}\} = \mathbf{A}- \mathbf{B}= \mathbf{A}+ (-1)\mathbf{B}, \] where it can be seen that the elements of \(\mathbf{C}\) are given by \(c_{k \ell} = a_{k \ell} - b_{k \ell}\), for \(k=1,\ldots,p\) and \(\ell = 1,\ldots,n\). Thus, matrix subtraction is just a combination of matrix addition and scalar multiplication (by \(- 1\)).
Matrix multiplication. The product of the \(p \times n\) matrix \(\mathbf{A}\) and \(n \times m\) matrix \(\mathbf{B}\) is given by the \(p \times m\) matrix \(\mathbf{C}\), where \(\mathbf{C}= \{c_{kj}\} = \mathbf{A}\mathbf{B}\), with \[ c_{kj} = \sum_{\ell = 1}^n a_{k \ell} b_{\ell j}, \quad k=1,\ldots,p;\ j=1,\ldots,m. \] Thus, for the matrix product \(\mathbf{A}\mathbf{B}\) to exist, the number of columns in \(\mathbf{A}\) must equal the number of rows in \(\mathbf{B}\); so \(\mathbf{C}\) always has the number of rows that are in \(\mathbf{A}\) and the number of columns that are in \(\mathbf{B}\).
Orthogonal matrix. A square \(p \times p\) matrix \(\mathbf{A}\) is said to be orthogonal if \(\mathbf{A}\mathbf{A}' = \mathbf{A}' \mathbf{A}= \mathbf{I}_p\).
Vector inner product. As a special case of matrix multiplication, consider two vectors, \(\mathbf{a}\) and \(\mathbf{b}\), both of length \(p\). The inner product of \(\mathbf{a}\) and \(\mathbf{b}\) is given by the scalar \(\mathbf{a}' \mathbf{b}= \mathbf{b}' \mathbf{a}\equiv \sum_{k=1}^p a_k b_k\).
Vector outer product. For another special case of matrix multiplication, consider a \(p\)-dimensional vector \(\mathbf{a}\) and a \(q\)-dimensional vector \(\mathbf{b}\). The outer product \(\mathbf{a}\mathbf{b}'\) is given by the \(p \times q\) matrix \[ \mathbf{a}\mathbf{b}' \equiv \left[ \begin{array}{cccc} a_{1} b_{1} & a_{1} b_2 & \cdots & a_1 b_q \\ a_{2} b_1 & a_{2} b_2 & \cdots & a_2 b_q \\ \vdots & \vdots & \; & \vdots \\ a_{p} b_1 & a_{p} b_2 & \cdots & a_{p} b_q \\ \end{array}\right]. \] Note that (in general) \(\mathbf{a}\mathbf{b}' \neq \mathbf{b}' \mathbf{a}\).
Kronecker product. Consider two matrices, an \(n_a \times m_a\) matrix, \(\mathbf{A}\), and an \(n_b \times m_b\) matrix, \(\mathbf{B}\). The Kronecker product of \(\mathbf{A}\) and \(\mathbf{B}\) is given by the \(n_a n_b \times m_a m_b\) matrix \(\mathbf{A}\otimes \mathbf{B}\) defined as \[ \mathbf{A}\otimes \mathbf{B}\equiv \left[\begin{array}{ccc} a_{11} \mathbf{B}& \cdots & a_{1 m_a} \mathbf{B}\\ \vdots & \vdots & \vdots \\ a_{n_a 1} \mathbf{B}& \cdots & a_{n_a m_a} \mathbf{B} \end{array}\right]. \] If \(\mathbf{A}\) is \(n_a \times n_a\) and \(\mathbf{B}\) is \(n_b \times n_b\), the inverse and determinant of the Kronecker product can be expressed in terms of the Kronecker product of the inverses and determinants of the individual matrices, respectively: \[ (\mathbf{A}\otimes \mathbf{B})^{-1} = \mathbf{A}^{-1} \otimes \mathbf{B}^{-1}, \] \[ |\mathbf{A}\otimes \mathbf{B}| = |\mathbf{A}|^{n_b} \; |\mathbf{B}|^{n_a}. \]
Euclidean norm. Consider the \(p\)-dimensional real-valued vector \(\mathbf{a}= [a_1,a_2,\ldots,a_p]'\). The Euclidean norm is simply the Euclidean distance in \(p\)-dimensional space, given by
\[ ||\mathbf{a}|| \equiv \sqrt{\mathbf{a}' \mathbf{a}} \equiv \sqrt{\sum_{k=1}^p a^2_{k}}. \]
Symmetric matrix. A matrix \(\mathbf{A}\) is said to be symmetric if \(\mathbf{A}' = \mathbf{A}\).
Diagonal matrix. Consider the \(p \times p\) matrix \(\mathbf{A}\). The (main) diagonal elements of this matrix are given by the vector \([a_{11},a_{22},\ldots,a_{pp}]'\). Sometimes it is helpful to use a shorthand notation to construct a matrix with specific elements of a vector on the main diagonal and zeros for all other elements. For example,
\[ \textrm{diag}(b_1,b_2,\ldots,b_q) \equiv \left[ \begin{array}{ccccc} b_{1} & 0 & 0 & \cdots & 0\\ 0 & b_2 & 0 &\cdots & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 &\cdots & b_q \\ \end{array}\right]. \]
Trace of a matrix. Let \(\mathbf{A}\) be a \(p \times p\) square matrix. We then define the trace of this matrix, denoted \(\textrm{trace}(\mathbf{A})\) (or \(\text{tr}(\mathbf{A})\)) as the sum of the diagonal elements of \(\mathbf{A}\); that is,
\[ \textrm{trace}(\mathbf{A}) = \sum_{k=1}^p a_{kk}. \]
Non-negative-definite and positive-definite matrices. Consider a \(p \times p\) symmetric and real-valued matrix, \(\mathbf{A}\). If, for any non-zero real-valued vector \(\mathbf{x}\), the scalar given by the quadratic form \(\mathbf{x}' \mathbf{A}\mathbf{x}\) is non-negative, we say \(\mathbf{A}\) is a non-negative-definite matrix. Similarly, if \(\mathbf{x}' \mathbf{A}\mathbf{x}\) is strictly positive for any \(\mathbf{x}\neq {\mathbf{0}}\), we say that \(\mathbf{A}\) is a positive-definite matrix.
Matrix inverse. Consider the \(p \times p\) square matrix, \(\mathbf{A}\). If it exists, the matrix \(\mathbf{B}\) such that \(\mathbf{A}\mathbf{B}= \mathbf{B}\mathbf{A}= \mathbf{I}_p\) is known as the inverse matrix of \(\mathbf{A}\), and it is denoted by \(\mathbf{A}^{-1}\). Thus, \(\mathbf{A}^{-1} \mathbf{A}= \mathbf{A}\mathbf{A}^{-1} = \mathbf{I}_p\). If the inverse exists, we say that the matrix is invertible. Not every square matrix has an inverse, but every positive-definite matrix is invertible (and, the inverse matrix is also positive-definite).
Matrix square root. Let \(\mathbf{A}\) be a \(p \times p\) positive-definite matrix. Then there exists a matrix \(\mathbf{B}\) such that \(\mathbf{A}= \mathbf{B}\mathbf{B}\equiv \mathbf{B}^2\) and we say that \(\mathbf{B}\) is the matrix square root of \(\mathbf{A}\) and denote it by \(\mathbf{A}^{1/2}\). The matrix square root of a positive-definite matrix is also positive-definite and we can write the inverse matrix as \(\mathbf{A}^{-1} = \mathbf{A}^{-1/2} \mathbf{A}^{-1/2}\), where \(\mathbf{A}^{-1/2}\) is the inverse of \(\mathbf{A}^{1/2}\).
Spectral decomposition. Let \(\mathbf{A}\) be a \(p \times p\) symmetric matrix of real values. This matrix can be decomposed as \[ \mathbf{A}= \sum_{k=1}^p \lambda_k \boldsymbol{\phi}_k \boldsymbol{\phi}_k' = \boldsymbol{\Phi}\boldsymbol{\Lambda}\boldsymbol{\Phi}', \] where \(\boldsymbol{\Lambda}= \textrm{diag}(\lambda_1,\ldots,\lambda_p)\), \(\boldsymbol{\Phi}= [\boldsymbol{\phi}_1,\ldots,\boldsymbol{\phi}_p]\), and \(\{\lambda_k\}\) are called the eigenvalues that are associated with the eigenvectors, \(\{\boldsymbol{\phi}_k\}\), \(k=1,\ldots,p\), which are orthogonal (i.e., \(\boldsymbol{\Phi}\boldsymbol{\Phi}' = \boldsymbol{\Phi}' \boldsymbol{\Phi}= \mathbf{I}_p\)). Note that for a symmetric non-negative-definite matrix \(\mathbf{A}\), \(\lambda_k \ge 0\), and for a symmetric positive-definite matrix \(\mathbf{A}\), \(\lambda_k > 0\), for all \(k = 1,\ldots,p\). The matrix square root and its inverse can be written as \(\mathbf{A}^{1/2} = \boldsymbol{\Phi}\textrm{diag}(\lambda_1^{1/2},\ldots,\lambda_p^{1/2}) \boldsymbol{\Phi}'\) and \(\mathbf{A}^{-1/2} = \boldsymbol{\Phi}\textrm{diag}(\lambda_1^{-1/2},\ldots,\lambda_p^{-1/2}) \boldsymbol{\Phi}'\), respectively.
Singular value decomposition (SVD). Let \(\mathbf{A}\) be a \(p \times n\) matrix of real values. Then the matrix \(\mathbf{A}\) can be decomposed as \(\mathbf{A}= \mathbf{U}\mathbf{D}\mathbf{V}'\), where \(\mathbf{U}\) and \(\mathbf{V}\) are \(p \times p\) and \(n \times n\) orthogonal matrices, respectively. In addition, the \(p \times n\) matrix \(\mathbf{D}\) contains all zeros except for the \((k,k)\)th non-negative elements, \(\{d_k: \; k=1,2,\ldots,\min(p,n)\}\), which are known as singular values.