# Cholesky Decomposition

Published:

Cholesky decomposition is the factorization of a positive-definite, Hermitian matrix into the product of a lower triangular matrix and its conjugate transpose. It is useful in numerous applications including solving linear systems, optimization, and monte-carlo simulations.

## Formulation

The formulation of a Cholesky decomposition is fairly straight-forward. Let $$A$$ be a positive-definite, Hermitian matrix. Then

$A = LL^*$

is the Cholesky decomposition, where $$L$$ is lower-triangular and $$L^*$$ is the conjugate transpose of $$L$$.

If $$A$$ is in fact positive-definite, then $$L$$ is unique.

## Algorithm

For the sake of simplicity assume, $$A$$ is a real, symmetric $$3\times 3$$ matrix. Then we can write out the equation $$A = LL^{T}$$ element-wise,

\begin{aligned} A &= \begin{bmatrix} l_{11} & 0 & 0 \\ l_{21} & l_{22} & 0 \\ l_{31} & l_{32} & l_{33} \end{bmatrix} \cdot \begin{bmatrix} l_{11} & l_{21} & l_{31} \\ 0 & l_{22} & l_{32} \\ 0 & 0 & l_{33} \end{bmatrix} \\ &= \begin{bmatrix} l_{11}^2 & l_{21}l_{11} & l_{31}l_{11} \\ l_{21}l_{11} & l_{21}^2 + l_{22}^2 & l_{31}l_{21} + l_{32}l_{22} \\ l_{31}l_{11} & l_{31}l_{21} + l_{32}l_{22} & l_{31}^2 + l_{32}^2 + l_{33}^2 \end{bmatrix} \end{aligned}

Now that we have some representation of $$A$$ we can derive $$L$$ from the values of $$A$$, which gives

$L = \begin{bmatrix} \sqrt{a_{11}} & 0 & 0 \\ a_{21}/l_{11} & \sqrt{a_{22} - l_{21}^2} & 0 \\ a_{31}/l_{11} & (a_{32} - l_{31}l_{21})/l_{22} & \sqrt{a_{33} - l_{31}^2 - l_{32}^2} \end{bmatrix}$

From this you can see a pattern in the elements of $$L$$. For instance, the diagonals are the square root of that element of $$A$$ minus the squares of all the elements before it in the row. Using these patterns we can define each value of $$L$$ by its index.

\begin{aligned} l_{j,j} =& (\pm) \sqrt{ a_{j,j} - \sum_{k=1}^{j-1} l_{j,k}^2 } \\ l_{i,j} =& \frac{1}{l_{j,j}} \left( a_{i,j} - \sum_{k=1}^{j-1} l_{i,k}l_{j,k} \right) \end{aligned}

Each value of $$L$$ is dependent of the values above and to the left. Likewise computations must be done descending from the top left corner of the matrix.

It is quite trivial to derive an $$\mathcal{O}(n^3)$$ algorithm from these update rules.

for k=0...N

A[k,k] = sqrt(A[k,k])
for i=k+1...n
A[k,i] /= A[k,k]

for j=k+1...n
for i=j...n
A[j,i] -= A[k,j] * A[k,i]


## Applications

Cholesky decomposition actually has many applications including linear systems, linear least-squares, monte-carlo simulations, Kalman filters, and matrix inversion.

### Linear Systems

The most common use of decomposition is solving the linear system $$A \bm{x}=\bm{b}$$. If $$A$$ is real symmetric, positive-definite then we can use Cholesky to solve.

First, let $$A\bm{x}=\bm{b}$$. Cholesky decomposition gives us $$A=LL^T$$. Now, using forward substitution, we can solve $$L\bm{y}=\bm{b}$$ for $$\bm{y}$$ and, using back substitution, solve $$L^T\bm{x}=\bm{y}$$ for $$\bm{x}$$.

### Least-Squares

Linear least squares problems are formulated as finding $$\hat{\bm{\theta}} = \argmin_{\bm \theta}{\vert|\bm{y}-X\bm{\theta} \vert|^2}$$ where $$\hat{\bm{y}}=X\hat{\bm{\theta}}$$ and $$X\in\mathbb{R}^{n\times n}$$, $$\bm{\theta},\bm{y}\in\mathbb{R}^n$$. A solution exists for this using the normal equations. Namely,

\begin{aligned} (X^TX)\hat{\bm{\theta}} = X^T\bm{y} \\ \implies \hat{\bm{\theta}} = (X^TX)^{-1}X^T\bm{y} . \end{aligned}

Now $$X^TX$$ is symmetric positive definite so it can be factored into $$X^TX = LL^T$$. Then it just comes down to solving $$L^T\bm{y}=X^T\bm{\theta}$$ using forward substitution and $$L\bm{\theta}=\bm{y}$$ using back substitution.

## Implementations

### Serial

Translating the pseudocode algorithm into C/C++ is fairly straightforward. Here’s an example C++ implementation:

template <typename T>
void serial_potrf(la::Matrix<T> &A)
{

for (std::size_t k = 0; k < A.rows(); k++)
{
A.at(k, k) = std::sqrt(A.at(k, k));
for (std::size_t i = k + 1; i < A.rows(); i++)
{
A.at(i, k) /= A.at(k, k);
}

for (std::size_t j = k + 1; j < A.rows(); j++)
{
for (std::size_t i = j; i < A.rows(); i++)
{
A.at(i, j) -= A.at(j, k) * A.at(i, k);
}
}
}
}


You’ll note that this writes $$L$$ in-place into $$A$$. It also only computes $$L$$. The upper-right triangle (excluding diagonal) are all junk values.