Meet Our Team
home

Derivation of LBFGS Part 3 - BFGS Method

오태호 (Taeho Oh)
오태호 (Taeho Oh)
안녕하세요. 오태호입니다.
이 글에서는 Optimization 기법중에 하나인 LBFGS Method(Limited Memory Broyden–Fletcher–Goldfarb–Shanno Method)의 수식을 다음과 같이 4개의 Part에 걸쳐서 차근차근 유도하여 LBFGS Method의 구조를 조금 깊게 살펴보도록 하겠습니다.
Derivation of LBFGS Part 3 - BFGS Method

BFGS Method

Quasi Newton Method중에 하나인 SR1 Method는 Hk+1\mathbf{H}_{k+1}을 안정적으로 구할 수 없는 문제가 있습니다. Hk+1\mathbf{H}_{k+1}을 안정적으로 구하기 위해 SR1 Method를 조금 개량한 BFGS Method(Broyden Fletcher Goldfarb Shanno Method)를 알아보겠습니다.
SR1 Method에서는 Rank 1 Matrix (aukukTa \mathbf{u}_{k} \mathbf{u}_{k}^{T})를 사용해서 Bk+1\mathbf{B}_{k+1}을 구했는데 BFGS Method에서는 다음과 같이 Rank 2 Matrix (aukukT+bvkvkTa \mathbf{u}_{k} \mathbf{u}_{k}^{T}+b \mathbf{v}_{k} \mathbf{v}_{k}^{T})를 사용해서 Bk+1\mathbf{B}_{k+1}을 구합니다.
Bk+1=Bk+aukukT+bvkvkT\mathbf{B}_{k+1}=\mathbf{B}_{k}+a \mathbf{u}_{k} \mathbf{u}_{k}^{T}+b \mathbf{v}_{k} \mathbf{v}_{k}^{T}
uk=yk\mathbf{u}_{k}=\boldsymbol{y}_{k}, vk=Bksk\mathbf{v}_{k}=\mathbf{B}_{k} \mathbf{s}_{k}로 설정하고 aukukT+bvkvkTa \mathbf{u}_{k} \mathbf{u}_{k}^{T}+b \mathbf{v}_{k} \mathbf{v}_{k}^{T}가 구체적으로 어떤 값을 가지는지 Secant Equation을 이용해서 계산해 보겠습니다.
Bk+1=Bk+aukukT+bvkvkTBk+1sk=yk(Bk+aukukT+bvkvkT)sk=yk(aukTsk)uk+(bvkTsk)vk=ykBksk(aukTsk)uk=yk=uk(bvkTsk)vk=Bksk=vka=1ukTsk=1ykTskb=1vkTsk=1skTBkTsk=1skTBksk\begin{array}{l}\mathbf{B}_{k+1}=\mathbf{B}_{k}+a \mathbf{u}_{k} \mathbf{u}_{k}^{T}+b \mathbf{v}_{k} \mathbf{v}_{k}^{T} \\\mathbf{B}_{k+1} \mathbf{s}_{k}=\mathbf{y}_{k} \\\left(\mathbf{B}_{k}+a \mathbf{u}_{k} \mathbf{u}_{k}^{T}+b \mathbf{v}_{k} \mathbf{v}_{k}^{T}\right) \mathbf{s}_{k}=\mathbf{y}_{k} \\\left(a \mathbf{u}_{k}^{T} \mathbf{s}_{k}\right) \mathbf{u}_{k}+\left(b \mathbf{v}_{k}^{T} \mathbf{s}_{k}\right) \mathbf{v}_{k}=\mathbf{y}_{k}-\mathbf{B}_{k} \mathbf{s}_{k} \\\left(a \mathbf{u}_{k}^{T} \mathbf{s}_{k}\right) \mathbf{u}_{k}=\mathbf{y}_{k}=\mathbf{u}_{k} \\\left(b \mathbf{v}_{k}^{T} \mathbf{s}_{k}\right) \mathbf{v}_{k}=-\mathbf{B}_{k} \mathbf{s}_{k}=-\mathbf{v}_{k} \\a=\frac{1}{\mathbf{u}_{k}^{T} \mathbf{s}_{k}}=\frac{1}{\mathbf{y}_{k}^{T} \mathbf{s}_{k}} \\b=-\frac{1}{\mathbf{v}_{k}^{T} \mathbf{s}_{k}}=-\frac{1}{\mathbf{s}_{k}^{T} \mathbf{B}_{k}^{T} \mathbf{s}_{k}}=-\frac{1}{\mathbf{s}_{k}^{T} \mathbf{B}_{k} \mathbf{s}_{k}}\end{array}
정리하면 다음과 같습니다.
Bk+1=BkBkskskTBkskTBksk+ykykTykTsk\mathbf{B}_{k+1}=\mathbf{B}_{k}-\frac{\mathbf{B}_{k} \mathbf{s}_{k} \mathbf{s}_{k}^{T} \mathbf{B}_{k}}{\mathbf{s}_{k}^{T} \mathbf{B}_{k} \mathbf{s}_{k}}+\frac{\mathbf{y}_{k} \mathbf{y}_{k}^{T}}{\mathbf{y}_{k}^{T} \mathbf{s}_{k}}
이제 여기서 구한 Bk+1\mathbf{B}_{k+1}을 이용해서 Hk+1\mathbf{H}_{k+1}를 구해야 합니다. Bk+1\mathbf{B}_{k+1}을 이용해서  Hk+1\mathbf{H}_{k+1}를 구하기 위해 Sherman Morrison Woodbury Formula을 이용합니다.
Sherman Morrison Woodbury Formula는 다음과 같습니다.
(A+UCV)1=A1A1U(C1+VA1U)1VA1(\mathbf{A}+\mathbf{U C V})^{-1}=\mathbf{A}^{-1}-\mathbf{A}^{-1} \mathbf{U}\left(\mathbf{C}^{-1}+\mathbf{V A}^{-1} \mathbf{U}\right)^{-1} \mathbf{V} \mathbf{A}^{-1}
Hk+1\mathbf{H}_{k+1}은 다음과 같이 구합니다.
A=BkC=IU=[Bkskyk]V=[1skTBksk001ykTsk][skTBkykT]\begin{aligned}\mathbf{A} & =\mathbf{B}_{k} \\\mathbf{C} & =\mathbf{I} \\\mathbf{U} & =\left[\begin{array}{ll}\mathbf{B}_{k} \mathbf{s}_{k} & \mathbf{y}_{k}\end{array}\right] \\\mathbf{V} & =\left[\begin{array}{cc}-\frac{1}{\mathbf{s}_{k}^{T} \mathbf{B}_{k} \mathbf{s}_{k}} & 0 \\0 & \frac{1}{\mathbf{y}_{k}^{T} \mathbf{s}_{k}}\end{array}\right]\left[\begin{array}{c}\mathbf{s}_{k}^{T} \mathbf{B}_{k} \\\mathbf{y}_{k}^{T}\end{array}\right]\end{aligned}
Hk+1=Bk+11=(Bk+aukukT+bvkvkT)1=(BkBkskskTBkskTBksk+ykykTykTsk)1=(A+UCV)1=(Bk+[Bkskyk][1skTBksk001ykTsk][skTBkykT])1=Bk1Bk1[Bkskyk](I+[1skTBksk001ykTsk][skTBkykT]Bk1[Bkskyk])1[1skTBksk001ykTsk][skTBkykT]Bk1=Hk[skHkyk]([skTBksk00ykTsk]+[skTBkykT]Hk[Bkskyk])1[skTykTHk]=Hk[skHkyk]([skTBksk00ykTsk]+[skTBkskskTykykTskykTHkyk])1[skTykTHk]Hk+1=Bk+11=(Bk+aukukT+bvkvkT)1=(BkBkskskTBkskTBksk+ykykTykTsk)1=(A+UCV)1=(Bk+[Bkskyk][1skTBksk001ykTsk][skTBkykT])1=Bk1Bk1[Bkskyk](I+[1skTBksk001ykTsk][skTBkykT]Bk1[Bkskyk])1[1skTBksk001ykTsk][skTBkykT]Bk1=Hk[skHkyk]([skTBksk00ykTsk]+[skTBkykT]Hk[Bkskyk])1[skTykTHk]=Hk[skHkyk]([skTBksk00ykTsk]+[skTBkskskTykykTskykTHkyk])1[skTykTHk]\begin{array}{l}\begin{array}{l}\mathbf{H}_{k+1}=\mathbf{B}_{k+1}^{-1} \\=\left(\mathbf{B}_{k}+a \mathbf{u}_{k} \mathbf{u}_{k}^{T}+b \mathbf{v}_{k} \mathbf{v}_{k}^{T}\right)^{-1} \\=\left(\mathbf{B}_{k}-\frac{\mathbf{B}_{k} \mathbf{s}_{k} \mathbf{s}_{k}^{T} \mathbf{B}_{k}}{\mathbf{s}_{k}^{T} \mathbf{B}_{k} \mathbf{s}_{k}}+\frac{\mathbf{y}_{k} \mathbf{y}_{k}^{T}}{\mathbf{y}_{k}^{T} \mathbf{s}_{k}}\right)^{-1} \\=(\mathbf{A}+\mathbf{U C V})^{-1} \\=\left(\mathbf{B}_{k}+\left[\begin{array}{ll}\mathbf{B}_{k} \mathbf{s}_{k} & \mathbf{y}_{k}\end{array}\right]\left[\begin{array}{cc}-\frac{1}{\mathbf{s}_{k}^{T} \mathbf{B}_{k} \mathbf{s}_{k}} & 0 \\0 & \frac{1}{\mathbf{y}_{k}^{T} \mathbf{s}_{k}}\end{array}\right]\left[\begin{array}{c}\mathbf{s}_{k}^{T} \mathbf{B}_{k} \\\mathbf{y}_{k}^{T}\end{array}\right]\right)^{-1} \\=\mathbf{B}_{k}^{-1}-\mathbf{B}_{k}^{-1}\left[\begin{array}{ll}\mathbf{B}_{k} \mathbf{s}_{k} & \mathbf{y}_{k}\end{array}\right]\left(\mathbf{I}+\left[\begin{array}{cc}-\frac{1}{\mathbf{s}_{k}^{T} \mathbf{B}_{k} \mathbf{s}_{k}} & 0 \\0 & \frac{1}{\mathbf{y}_{k}^{T} \mathbf{s}_{k}}\end{array}\right]\left[\begin{array}{c}\mathbf{s}_{k}^{T} \mathbf{B}_{k} \\\mathbf{y}_{k}^{T}\end{array}\right] \mathbf{B}_{k}^{-1}\left[\begin{array}{ll}\mathbf{B}_{k} \mathbf{s}_{k} & \mathbf{y}_{k}\end{array}\right]\right)^{-1}\left[\begin{array}{cc}-\frac{1}{\mathbf{s}_{k}^{T} \mathbf{B}_{k} \mathbf{s}_{k}} & 0 \\0 & \frac{1}{\mathbf{y}_{k}^{T} \mathbf{s}_{k}}\end{array}\right]\left[\begin{array}{c}\mathbf{s}_{k}^{T} \mathbf{B}_{k} \\\mathbf{y}_{k}^{T}\end{array}\right] \mathbf{B}_{k}^{-1}\end{array}\\\begin{array}{l}=\mathbf{H}_{k}-\left[\begin{array}{ll}\mathbf{s}_{k} & \mathbf{H}_{k} \mathbf{y}_{k}\end{array}\right]\left(\left[\begin{array}{cc}-\mathbf{s}_{k}^{T} \mathbf{B}_{k} \mathbf{s}_{k} & 0 \\0 & \mathbf{y}_{k}^{T} \mathbf{s}_{k}\end{array}\right]+\left[\begin{array}{c}\mathbf{s}_{k}^{T} \mathbf{B}_{k} \\\mathbf{y}_{k}^{T}\end{array}\right] \mathbf{H}_{k}\left[\begin{array}{ll}\mathbf{B}_{k} \mathbf{s}_{k} & \mathbf{y}_{k}\end{array}\right]\right)^{-1}\left[\begin{array}{c}\mathbf{s}_{k}^{T} \\\mathbf{y}_{k}^{T} \mathbf{H}_{k}\end{array}\right] \\=\mathbf{H}_{k}-\left[\begin{array}{ll}\mathbf{s}_{k} & \mathbf{H}_{k} \mathbf{y}_{k}\end{array}\right]\left(\left[\begin{array}{cc}-\mathbf{s}_{k}^{T} \mathbf{B}_{k} \mathbf{s}_{k} & 0 \\0 & \mathbf{y}_{k}^{T} \mathbf{s}_{k}\end{array}\right]+\left[\begin{array}{cc}\mathbf{s}_{k}^{T} \mathbf{B}_{k} \mathbf{s}_{k} & \mathbf{s}_{k}^{T} \mathbf{y}_{k} \\\mathbf{y}_{k}^{T} \mathbf{s}_{k} & \mathbf{y}_{k}^{T} \mathbf{H}_{k} \mathbf{y}_{k}\end{array}\right]\right)^{-1}\left[\begin{array}{c}\mathbf{s}_{k}^{T} \\\mathbf{y}_{k}^{T} \mathbf{H}_{k}\end{array}\right]\end{array}\\\mathbf{H}_{k+1}=\mathbf{B}_{k+1}^{-1}\\=\left(\mathbf{B}_{k}+a \mathbf{u}_{k} \mathbf{u}_{k}^{T}+b \mathbf{v}_{k} \mathbf{v}_{k}^{T}\right)^{-1}\\=\left(\mathbf{B}_{k}-\frac{\mathbf{B}_{k} \mathbf{s}_{k} \mathbf{s}_{k}^{T} \mathbf{B}_{k}}{\mathbf{s}_{k}^{T} \mathbf{B}_{k} \mathbf{s}_{k}}+\frac{\mathbf{y}_{k} \mathbf{y}_{k}^{T}}{\mathbf{y}_{k}^{T} \mathbf{s}_{k}}\right)^{-1}\\=(\mathbf{A}+\mathbf{U C V})^{-1}\\=\left(\mathbf{B}_{k}+\left[\begin{array}{ll}\mathbf{B}_{k} \mathbf{s}_{k} & \mathbf{y}_{k}\end{array}\right]\left[\begin{array}{cc}-\frac{1}{\mathbf{s}_{k}^{T} \mathbf{B}_{k} \mathbf{s}_{k}} & 0 \\0 & \frac{1}{\mathbf{y}_{k}^{T} \mathbf{s}_{k}}\end{array}\right]\left[\begin{array}{c}\mathbf{s}_{k}^{T} \mathbf{B}_{k} \\\mathbf{y}_{k}^{T}\end{array}\right]\right)^{-1}\\=\mathbf{B}_{k}^{-1}-\mathbf{B}_{k}^{-1}\left[\begin{array}{ll}\mathbf{B}_{k} \mathbf{s}_{k} & \mathbf{y}_{k}\end{array}\right]\left(\mathbf{I}+\left[\begin{array}{cc}-\frac{1}{\mathbf{s}_{k}^{T} \mathbf{B}_{k} \mathbf{s}_{k}} & 0 \\0 & \frac{1}{\mathbf{y}_{k}^{T} \mathbf{s}_{k}}\end{array}\right]\left[\begin{array}{c}\mathbf{s}_{k}^{T} \mathbf{B}_{k} \\\mathbf{y}_{k}^{T}\end{array}\right] \mathbf{B}_{k}^{-1}\left[\begin{array}{ll}\mathbf{B}_{k} \mathbf{s}_{k} & \mathbf{y}_{k}\end{array}\right]\right)^{-1}\left[\begin{array}{cc}-\frac{1}{\mathbf{s}_{k}^{T} \mathbf{B}_{k} \mathbf{s}_{k}} & 0 \\0 & \frac{1}{\mathbf{y}_{k}^{T} \mathbf{s}_{k}}\end{array}\right]\left[\begin{array}{c}\mathbf{s}_{k}^{T} \mathbf{B}_{k} \\\mathbf{y}_{k}^{T}\end{array}\right] \mathbf{B}_{k}^{-1}\\=\mathbf{H}_{k}-\left[\begin{array}{ll}\mathbf{s}_{k} & \mathbf{H}_{k} \mathbf{y}_{k}\end{array}\right]\left(\left[\begin{array}{cc}-\mathbf{s}_{k}^{T} \mathbf{B}_{k} \mathbf{s}_{k} & 0 \\0 & \mathbf{y}_{k}^{T} \mathbf{s}_{k}\end{array}\right]+\left[\begin{array}{c}\mathbf{s}_{k}^{T} \mathbf{B}_{k} \\\mathbf{y}_{k}^{T}\end{array}\right] \mathbf{H}_{k}\left[\begin{array}{ll}\mathbf{B}_{k} \mathbf{s}_{k} & \mathbf{y}_{k}\end{array}\right]\right)^{-1}\left[\begin{array}{c}\mathbf{s}_{k}^{T} \\\mathbf{y}_{k}^{T} \mathbf{H}_{k}\end{array}\right]\\=\mathbf{H}_{k}-\left[\begin{array}{ll}\mathbf{s}_{k} & \mathbf{H}_{k} \mathbf{y}_{k}\end{array}\right]\left(\left[\begin{array}{cc}-\mathbf{s}_{k}^{T} \mathbf{B}_{k} \mathbf{s}_{k} & 0 \\0 & \mathbf{y}_{k}^{T} \mathbf{s}_{k}\end{array}\right]+\left[\begin{array}{cc}\mathbf{s}_{k}^{T} \mathbf{B}_{k} \mathbf{s}_{k} & \mathbf{s}_{k}^{T} \mathbf{y}_{k} \\\mathbf{y}_{k}^{T} \mathbf{s}_{k} & \mathbf{y}_{k}^{T} \mathbf{H}_{k} \mathbf{y}_{k}\end{array}\right]\right)^{-1}\left[\begin{array}{c}\mathbf{s}_{k}^{T} \\\mathbf{y}_{k}^{T} \mathbf{H}_{k}\end{array}\right]\end{array}
정리하면 다음과 같습니다.
Hk+1=(IskykTykTsk)Hk(IykskTykTsk)+skskTykTsk\mathbf{H}_{k+1}=\left(\mathbf{I}-\frac{\mathbf{s}_{k} \mathbf{y}_{k}^{T}}{\mathbf{y}_{k}^{T} \mathbf{s}_{k}}\right) \mathbf{H}_{k}\left(\mathbf{I}-\frac{\mathbf{y}_{k} \mathbf{s}_{k}^{T}}{\mathbf{y}_{k}^{T} \mathbf{s}_{k}}\right)+\frac{\mathbf{s}_{k} \mathbf{s}_{k}^{T}}{\mathbf{y}_{k}^{T} \mathbf{s}_{k}}
SR1 Method에 있던 문제가 어떻게 개선되었는지 살펴보겠습니다.
SR1 Method에서는 Bk\mathbf{B}_{k}와 Hk\mathbf{H}_{k}가 Positive Definite Matrix라는 것이 보장되지 않았습니다.
BFGS Method에서 Hk\mathbf{H}_{k}가 Positive Definite Matrix라는 것은 다음과 같이 확인합니다.
우선 H0\mathbf{H}_{0}을 적당한 Positive Definite Matrix로(예를 들어 I로) 설정합니다.
만약에 Hk\mathbf{H}_{k}가Positive Definite Matrix라면 Hk+1\mathbf{H}_{k+1}이 Positive Definite Matrix가 되는지 다음과 같이 확인합니다.
wTHk+1w=wT(IskykTykTsk)Hk(IykskTykTsk)w+wTskskTykTskw=[(IykskTykTsk)w]THk[(IykskTykTsk)w]+skTw2ykTsk\begin{aligned}\mathbf{w}^{T} \mathbf{H}_{k+1} \mathbf{w} & =\mathbf{w}^{T}\left(\mathbf{I}-\frac{\mathbf{s}_{k} \mathbf{y}_{k}^{T}}{\mathbf{y}_{k}^{T} \mathbf{s}_{k}}\right) \mathbf{H}_{k}\left(\mathbf{I}-\frac{\mathbf{y}_{k} \mathbf{s}_{k}^{T}}{\mathbf{y}_{k}^{T} \mathbf{s}_{k}}\right) \mathbf{w}+\mathbf{w}^{T} \frac{\mathbf{s}_{k} \mathbf{s}_{k}^{T}}{\mathbf{y}_{k}^{T} \mathbf{s}_{k}} \mathbf{w} \\& =\left[\left(\mathbf{I}-\frac{\mathbf{y}_{k} \mathbf{s}_{k}^{T}}{\mathbf{y}_{k}^{T} \mathbf{s}_{k}}\right) \mathbf{w}\right]^{T} \mathbf{H}_{k}\left[\left(\mathbf{I}-\frac{\mathbf{y}_{k} \mathbf{s}_{k}^{T}}{\mathbf{y}_{k}^{T} \mathbf{s}_{k}}\right) \mathbf{w}\right]+\frac{\left\|\mathbf{s}_{k}^{T} \mathbf{w}\right\|^{2}}{\mathbf{y}_{k}^{T} \mathbf{s}_{k}}\end{aligned}
Hk\mathbf{H}_{k}가 Positive Definite Matrix이므로 모든 w에 대해 다음이 성립합니다.
[(IykskTykTsk)w]THk[(IykskTykTsk)w]>0\left[\left(\mathbf{I}-\frac{\mathbf{y}_{k} \mathbf{s}_{k}^{T}}{\mathbf{y}_{k}^{T} \mathbf{s}_{k}}\right) \mathbf{w}\right]^{T} \mathbf{H}_{k}\left[\left(\mathbf{I}-\frac{\mathbf{y}_{k} \mathbf{s}_{k}^{T}}{\mathbf{y}_{k}^{T} \mathbf{s}_{k}}\right) \mathbf{w}\right]>0
만약에 ykTsk>0\mathbf{y}_{k}^{T} \mathbf{s}_{k}>0이 성립한다면 skTw2ykTsk>0\frac{\left\|\mathbf{s}_{k}^{T} \mathbf{w}\right\|^{2}}{\mathbf{y}_{k}^{T} \mathbf{s}_{k}}>0이 성립하면서 Hk+1\mathbf{H}_{k+1}은 Positive Definite Matrix가 됩니다.
Wolfe Conditions의 Curvature Condition이 성립하면 ykTsk>0\mathbf{y}_{k}^{T} \mathbf{s}_{k}>0가 성립함을 다음과 같이 확인합니다.
tk>00<c2<1pk=Hkgkxk+1=xk+tkpkpkTf(xk+tkpk)c2pkTf(xk)pkTf(xk+1)c2pkTf(xk)pkTgk+1c2pkTgktkpkTgk+1tkc2pkTgkgk+1T(tkpk)c2gkT(tkpk)gk+1T(xk+1xk)c2gkT(xk+1xk)gk+1Tskc2gkTskgk+1TskgkTskc2gkTskgkTsk(gk+1TgkT)sk(c21)gkTskykTsk(c21)gkTsk=(c21)gkT(xk+1xk)=(c21)gkT(tkpk)=tk(c21)gkTpk=tk(c21)gkT(Hkgk)=tk(1c2)(gkTHkgk)>0ykTsk>0\begin{aligned} &t_k \gt 0 \\&0 \lt c_2 \lt 1 \\&\mathbf{p}_k=-\mathbf{H}_k\mathbf{g}_k \\&\mathbf{x}_{k+1}=\mathbf{x}_k+t_k\mathbf{p}_k \\&\mathbf{p}_k^T \nabla f(\mathbf{x}_k+t_k\mathbf{p}_k) \ge c_2 \mathbf{p}_k^T \nabla f(\mathbf{x}_k) \\&\mathbf{p}_k^T \nabla f(\mathbf{x}_{k+1}) \ge c_2 \mathbf{p}_k^T \nabla f(\mathbf{x}_k) \\&\mathbf{p}_k^T \mathbf{g}_{k+1} \ge c_2 \mathbf{p}_k^T \mathbf{g}_k \\&t_k\mathbf{p}_k^T \mathbf{g}_{k+1} \ge t_k c_2 \mathbf{p}_k^T \mathbf{g}_k \\&\mathbf{g}_{k+1}^T (t_k\mathbf{p}_k) \ge c_2 \mathbf{g}_k^T (t_k\mathbf{p}_k) \\&\mathbf{g}_{k+1}^T (\mathbf{x}_{k+1}-\mathbf{x}_k) \ge c_2 \mathbf{g}_k^T (\mathbf{x}_{k+1}-\mathbf{x}_k) \\&\mathbf{g}_{k+1}^T \mathbf{s}_k \ge c_2 \mathbf{g}_k^T \mathbf{s}_k \\&\mathbf{g}_{k+1}^T \mathbf{s}_k - \mathbf{g}_k^T \mathbf{s}_k \ge c_2 \mathbf{g}_k^T \mathbf{s}_k - \mathbf{g}_k^T \mathbf{s}_k \\&(\mathbf{g}_{k+1}^T - \mathbf{g}_k^T) \mathbf{s}_k \ge (c_2 - 1) \mathbf{g}_k^T \mathbf{s}_k\end{aligned} \\\begin{aligned}\mathbf{y}_k^T \mathbf{s}_k \ge (c_2 - 1) \mathbf{g}_k^T \mathbf{s}_k&=(c_2 - 1) \mathbf{g}_k^T ( \mathbf{x}_{k+1} - \mathbf{x}_k ) \\&=(c_2 - 1) \mathbf{g}_k^T (t_k \mathbf{p}_k) \\&=t_k (c_2 - 1) \mathbf{g}_k^T \mathbf{p}_k \\&=t_k (c_2 - 1) \mathbf{g}_k^T (-\mathbf{H}_k\mathbf{g}_k) \\&=t_k (1 - c_2) (\mathbf{g}_k^T\mathbf{H}_k\mathbf{g}_k) \gt 0 \\ \\\mathbf{y}_k^T \mathbf{s}_k \gt 0 \end{aligned}
BFGS Method에서는, Wolfe Conditions의 Curvature Condition을 따라 Step Size tk\mathbf{t}_{k}를 결정하고, Initial Inverse Hessian H0\mathbf{H}_{0}이 Positive Definite Matrix이면, Inverse Hessian Hk\mathbf{H}_{k}는 모든 kk에 대해 Positive Definite Matrix가 됩니다. 자연스럽게 Hessian bk\mathbf{b}_{k}도 모든 kk에 대해 Positive Definite Matrix가 됩니다.
SR1 Method에서는, Bk+1\mathbf{B}_{k+1}의 계산과정에서 (ykBksk)Tsk0\left(\mathbf{y}_{k}-\mathbf{B}_{k} \mathbf{s}_{k}\right)^{T} \mathbf{s}_{k} \approx 0이 되면, 분모가 0이 될 수가 있어서, Bk+1\mathbf{B}_{k+1}과 Hk+1\mathbf{H}_{k+1}을 안정적으로 구할 수 없었습니다.
BFGS Method에서는, Wolfe Conditions의 Curvature Condition을 따라 Step Size tk\mathbf{t}_{k}를 결정하면, ykTsk>0\mathbf{y}_{k}^{T} \mathbf{s}_{k}>0가 보장되어, 계산과정에서 분모가 0이 되지 않아서 안정적으로 Bk+1\mathbf{B}_{k+1}과 Hk+1\mathbf{H}_{k+1}을 구할 수 있습니다.
하지만 BFGS Method에는 큰 단점이 하나 있습니다. Iteration마다 Hk\mathbf{H}_{k}를 저장해야 하는데 이것은 x이 n차원 Vector인 경우에 Hk\mathbf{H}_{k}가 n×nn \times n의 Matrix가 되기 때문에 n이 매우 큰 경우에 Hk\mathbf{H}_{k}를 저장하기가 쉽지 않습니다.

Conclusion

이 글에서는 LBFGS Method를 살펴보기 위한 과정으로 BFGS Method에 대해 살펴보았습니다.
BFGS Method에는 x이 고차원인 Vector인 경우에 Hk\mathbf{H}_{k}를 저장하기가 쉽지 않은 문제가 있습니다.
Derivation of LBFGS Part 4 - LBFGS Method에서는 Hk\mathbf{H}_{k}를 저장하지 않고 BFGS Method를 수행하는 방법을 알아보도록 하겠습니다.
작성자
관련된 글 더 보기