Meet Our Team
home

Derivation of the Multivariate Normal Distribution

오태호 (Taeho Oh)
오태호 (Taeho Oh)
안녕하세요. 오태호입니다.
이번 글에서는 Multivariate Normal Distribution의 Probability Distribution Function을 유도해 보도록 하겠습니다. 교과서에 식만 나와 있고 유도과정이 나와 있지 않은 경우가 많아서 유도과정을 정리해 보았습니다.
증명과정중에 Matrix나 Vector는 굵은 글꼴로 표현하도록 하겠습니다. 그리고 Vector는 특별히 언급이 없으면 Column Vector를 의미합니다.

Covariance Matrix

X\mathbf{X}는 Random Variable X1X_1X2X_2, ⋯⋯, XnX_n으로 이루어진 Vector일 때 Covariance Matrix는 아래와 같이 정의됩니다.
CX=[Var(X1)Cov(X1,X2)Cov(X1,Xn)Cov(X2,X1)Var(X2)Cov(X2,Xn) Cov(Xn,X1)Cov(Xn,X2)Var(Xn)]=E((XE(X))(XE(X))T)\begin{aligned}\mathbf{C_X}&=\begin{bmatrix}Var(X_1) & Cov(X_1, X_2) & \cdots & Cov(X_1, X_n) \\Cov(X_2, X_1) & Var(X_2) & \cdots & Cov(X_2, X_n) \\\vdots & \vdots & \ddots &  \vdots \\Cov(X_n, X_1) & Cov(X_n, X_2) & \cdots & Var(X_n) \\\end{bmatrix} \\&=E((\mathbf{X}-E(\mathbf{X}))(\mathbf{X}-E(\mathbf{X}))^T)\end{aligned}
AA는 n×nn \times n인 Matrix이고 bb는 크기가 nn인 Vector일 때 Y=AX+b\mathbf{Y}=\mathbf{A}\mathbf{X}+\mathbf{b}의 Covariance Matrix를 CX\mathbf{C_X}를 이용해서 아래와 같이 구할 수 있습니다.
CY=E((YE(Y))(YE(Y))T)=E((AX+bE(AX+b))(AX+bE(AX+b))T)=E((AXE(AX))(AXE(AX))T)=E(A(XE(X))(XE(X))TAT)=AE((XE(X))(XE(X))T)AT=ACXAT\begin{aligned}\mathbf{C_Y}&=E((\mathbf{Y}-E(\mathbf{Y}))(\mathbf{Y}-E(\mathbf{Y}))^T) \\&=E((\mathbf{A}\mathbf{X}+\mathbf{b}-E(\mathbf{A}\mathbf{X}+\mathbf{b}))(\mathbf{A}\mathbf{X}+\mathbf{b}-E(\mathbf{A}\mathbf{X}+\mathbf{b}))^T) \\&=E((\mathbf{A}\mathbf{X}-E(\mathbf{A}\mathbf{X}))(\mathbf{A}\mathbf{X}-E(\mathbf{A}\mathbf{X}))^T) \\&=E(\mathbf{A}(\mathbf{X}-E(\mathbf{X}))(\mathbf{X}-E(\mathbf{X}))^T\mathbf{A}^T) \\&=\mathbf{A}E((\mathbf{X}-E(\mathbf{X}))(\mathbf{X}-E(\mathbf{X}))^T)\mathbf{A}^T \\&=\mathbf{A}\mathbf{C_X}\mathbf{A}^T\end{aligned}

Method of Transformations

X=G1(Y)=H(Y)\mathbf{X}=G^{-1}(\mathbf{Y})=H(\mathbf{Y}) 일 때 Probability Density Function fY(y)f_{\mathbf{Y}}(\mathbf{y})는 다음과 같습니다.
X=[X1X2Xn]=[H1(Y)H2(Y)Hn(Y)]J=det[H1y1H1y2H1ynH2y1H2y2H2ynHny1Hny2Hnyn]fY(y)=fX(H(y))J\begin{array}{c} \mathbf{X}=\begin{bmatrix}X_1 \\X_2 \\\vdots \\X_n\end{bmatrix}=\begin{bmatrix}H_1(\mathbf{Y}) \\H_2(\mathbf{Y}) \\\vdots \\H_n(\mathbf{Y})\end{bmatrix} \\J=\det\begin{bmatrix}\frac{\partial H_1}{\partial y_1} & \frac{\partial H_1}{\partial y_2} & \cdots & \frac{\partial H_1}{\partial y_n} \\\frac{\partial H_2}{\partial y_1} & \frac{\partial H_2}{\partial y_2} & \cdots & \frac{\partial H_2}{\partial y_n} \\\vdots & \vdots & \ddots & \vdots \\\frac{\partial H_n}{\partial y_1} & \frac{\partial H_n}{\partial y_2} & \cdots & \frac{\partial H_n}{\partial y_n} \\\end{bmatrix} \\f_{\mathbf{Y}}(\mathbf{y})=f_{\mathbf{X}}(H(\mathbf{y}))\left | J \right | \end{array}
이 방법을 이용하여 Y=AX+b\mathbf{Y}=\mathbf{A}\mathbf{X}+\mathbf{b}일 때 Probability Density Function fY(y)f_{\mathbf{Y}}(\mathbf{y})는 다음과 같이 구합니다.
Y=AX+b=G(X)X=A1(Yb)=H(Y)J=det(A1)=1det(A)fY(y)=fX(H(y))J=fX(A1(yb))1det(A)\begin{aligned} \mathbf{Y}&=\mathbf{A}\mathbf{X}+\mathbf{b}=G(\mathbf{X}) \\\mathbf{X}&=\mathbf{A}^{-1}(\mathbf{Y}-\mathbf{b})=H(\mathbf{Y}) \\J&=\det(\mathbf{A}^{-1})=\frac{1}{\det(\mathbf{A})} \\f_{\mathbf{Y}}(\mathbf{y})&=f_{\mathbf{X}}(H(\mathbf{y}))\left | J \right |=f_{\mathbf{X}}(\mathbf{A}^{-1}(\mathbf{y}-\mathbf{b}))\left | \frac{1}{\det(\mathbf{A})} \right | \end{aligned}

Standard Normal Distribution

ZiN(0,1)Z_i \sim N(0,1)일 때 Normal Distribution을 참조해 보면 Probability Density Function fZi(zi)f_{Z_i}(z_i)는 다음과 같습니다.
fZi(zi)=12πexp(12zi2)f_{Z_i}(z_i)=\frac{1}{\sqrt{2\pi}}\exp\left (-\frac{1}{2}z_i^2\right )
ZiN(0,1)Z_i \sim N(0,1)가 iid이고 Z=[Z1Z2Zn]T\mathbf{Z}= \begin{bmatrix} Z_1 & Z_2 & \cdots & Z_n \end{bmatrix}^T라고 할 때 fZ(z)f_{\mathbf{Z}}(\mathbf{z})는 다음과 같습니다.
fZ(z)=i=1nfZi(zi)=1(2π)n2exp(12i=1nzi2)=1(2π)n2exp(12zTz)\begin{aligned}f_{\mathbf{Z}}(\mathbf{z})&=\prod_{i=1}^{n}f_{Z_i}(z_i) \\&=\frac{1}{(2\pi)^{\frac{n}{2}}}\exp\left (-\frac{1}{2}\sum_{i=1}^n z_i^2\right ) \\&=\frac{1}{(2\pi)^{\frac{n}{2}}}\exp\left (-\frac{1}{2}\mathbf{z}^T\mathbf{z}\right ) \\\end{aligned}

Multivariate Normal Distribution

X=AZ+m\mathbf{X} = \mathbf{A}\mathbf{Z}+\mathbf{m}일때 fX(x)f_{\mathbf{X}}(\mathbf{x})는 다음과 같이 구합니다.
E(X)=mCX=ACZAT=AATdet(CX)=det(AAT)=det(A)det(AT)=(det(A))2det(CX)=det(A)fX(x)=fZ(H(x))J=fZ(A1(xm))1det(A)=1(2π)n21det(A)exp(12(A1(xm))T(A1(xm)))=1(2π)n2det(CX)exp(12(xm)T(AAT)1(xm))=1(2π)n2det(CX)exp(12(xm)TCX1(xm))\begin{aligned} E(\mathbf{X})&=\mathbf{m} \\\mathbf{C_X}&=\mathbf{A}\mathbf{C_Z}\mathbf{A}^T=\mathbf{A}\mathbf{A}^T \\\det(\mathbf{C_X})&=\det(\mathbf{A}\mathbf{A}^T)=\det(\mathbf{A})\det(\mathbf{A}^T)=(\det(\mathbf{A}))^2 \\\sqrt{\det(\mathbf{C_X})}&=\left | \det(\mathbf{A}) \right | \\f_{\mathbf{X}}(\mathbf{x})&=f_{\mathbf{Z}}(H(\mathbf{x}))\left | J \right | \\&=f_{\mathbf{Z}}(\mathbf{A}^{-1}(\mathbf{x}-\mathbf{m}))\left | \frac{1}{\det(\mathbf{A})} \right | \\&=\frac{1}{(2\pi)^{\frac{n}{2}}}\left | \frac{1}{\det(\mathbf{A})} \right | \exp\left (-\frac{1}{2}(\mathbf{A}^{-1}(\mathbf{x}-\mathbf{m}))^T(\mathbf{A}^{-1}(\mathbf{x}-\mathbf{m}))\right ) \\&=\frac{1}{(2\pi)^{\frac{n}{2}}\sqrt{\det(\mathbf{C_X})}}\exp\left (-\frac{1}{2}(\mathbf{x}-\mathbf{m})^T(\mathbf{A}\mathbf{A}^T)^{-1}(\mathbf{x}-\mathbf{m})\right ) \\&=\frac{1}{(2\pi)^{\frac{n}{2}}\sqrt{\det(\mathbf{C_X})}}\exp\left (-\frac{1}{2}(\mathbf{x}-\mathbf{m})^T\mathbf{C_X}^{-1}(\mathbf{x}-\mathbf{m})\right ) \\\end{aligned}
우리는 보통 X\mathbf{X}를 Data의 형태로 가지고 있고 A\mathbf{A}를 가지고 있지 않은데 fX(x)f_{\mathbf{X}}(\mathbf{x})를 살펴보면 A\mathbf{A}가 필요없고 CX\mathbf{C_X}만 가지고 있으면 됩니다. X\mathbf{X}의 Data를 통해 CX\mathbf{C_X}를 쉽게 계산할 수 있습니다. 하지만 X\mathbf{X}의 차원이 늘어나면 CX1\mathbf{C_X}^{-1}의 계산이 점점 어려워집니다.

Bivariate Normal Distribution

XN(μX,σX)X \sim N(\mu_X, \sigma_X)YN(μY,σY)Y \sim N(\mu_Y, \sigma_Y)ρ=Cov(X,Y)σXσY\rho=\frac{Cov(X,Y)}{\sigma_X\sigma_Y} 일 때 fXY(x,y)f_{XY}(x,y)는 다음과 같이 구합니다. Multivariate Normal Distribution의 특수 Case이기 때문에 Multivariate Normal Distribution을 사용하면 어렵지 않게 구할 수 있습니다.
n=2X=[XY]x=[xy]m=[μXμY]CX=[Var(X)Cov(X,Y)Cov(Y,X)Var(Y)]=[σX2ρσXσYρσXσYσY2]det(CX)=σX2σY2(1ρ2)CX1=1σX2σY2(1ρ2)[σY2ρσXσYρσXσYσX2]fXY(x,y)=fX(x)=1(2π)n2det(CX)exp(12(xm)TCX1(xm))=12πσXσY1ρ2exp(121σX2σY2(1ρ2)[xμXyμY]T[σY2ρσXσYρσXσYσX2][xμXyμY])=12πσXσY1ρ2exp(12(1ρ2)((xμXσX)2+(yμYσY)22ρ(xμX)(yμY)σXσY))\begin{aligned} n&=2 \\\mathbf{X}&=\begin{bmatrix}X \\Y\end{bmatrix} \\\mathbf{x}&=\begin{bmatrix}x \\y\end{bmatrix} \\\mathbf{m}&=\begin{bmatrix}\mu_X \\\mu_Y\end{bmatrix} \\\mathbf{C_X}&=\begin{bmatrix}Var(X) & Cov(X,Y) \\Cov(Y,X) & Var(Y)\end{bmatrix}=\begin{bmatrix}\sigma_X^2 & \rho\sigma_X\sigma_Y \\\rho\sigma_X\sigma_Y & \sigma_Y^2\end{bmatrix} \\\det(\mathbf{C_X})&=\sigma_X^2\sigma_Y^2(1-\rho^2) \\\mathbf{C_X}^{-1}&=\frac{1}{\sigma_X^2\sigma_Y^2(1-\rho^2)}\begin{bmatrix}\sigma_Y^2 & -\rho\sigma_X\sigma_Y \\-\rho\sigma_X\sigma_Y & \sigma_X^2\end{bmatrix} \\f_{XY}(x,y)&=f_{\mathbf{X}}(\mathbf{x}) \\&=\frac{1}{(2\pi)^{\frac{n}{2}}\sqrt{\det(\mathbf{C_X})}}\exp\left (-\frac{1}{2}(\mathbf{x}-\mathbf{m})^T\mathbf{C_X}^{-1}(\mathbf{x}-\mathbf{m})\right ) \\&=\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}\exp\left (-\frac{1}{2}\frac{1}{\sigma_X^2\sigma_Y^2(1-\rho^2)}\begin{bmatrix}x-\mu_X \\y-\mu_Y\end{bmatrix}^T\begin{bmatrix}\sigma_Y^2 & -\rho\sigma_X\sigma_Y \\-\rho\sigma_X\sigma_Y & \sigma_X^2\end{bmatrix}\begin{bmatrix}x-\mu_X \\y-\mu_Y\end{bmatrix}\right ) \\&=\frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}\exp\left (-\frac{1}{2(1-\rho^2)}( (\frac{x-\mu_X}{\sigma_X})^2 + (\frac{y-\mu_Y}{\sigma_Y})^2 - 2\rho\frac{(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y})\right )\end{aligned}
작성자
관련된 글 더 보기