はじめに
『トピックモデルによる統計的潜在意味解析』の学習時のメモです。基本的な内容は、数式の行間を読んで埋めたものになります。本と併せて読んでいただければと思います。
この記事では、3.3.6節の近似事後分布の形を仮定するLDAの変分ベイズ法について書いています。
数学よく解らない自分が理解できるレベルまで落として数式を書き下していますので、分かる人にはかなりくどいです。
【前節の内容】
www.anarchive-beta.com
【他の節一覧】
www.anarchive-beta.com
【この節の内容】
3.3.6 LDAの変分ベイズ法(2)
LDAに対して、あらかじめ近似事後分布の形を仮定して導出する。
トピック分布と単語分布の近似事後分布に関して、それぞれ$\boldsymbol{\xi}_d^{\theta}, \boldsymbol{\xi}_k^{\phi}$をパラメータとするDirichlet分布$q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta}), q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})$を仮定する。
・変分下限の導出
$\boldsymbol{z}, \boldsymbol{\phi}, \boldsymbol{\theta}$について周辺化(積分消去)して対数をとった対数周辺尤度$\log p(\boldsymbol{w} | \boldsymbol{\alpha}, \boldsymbol{\beta})$に対して、イエンセンの不等式を用いて変分下限を求める。
$$
\begin{aligned}
\log p(\boldsymbol{w} | \boldsymbol{\alpha}, \boldsymbol{\beta})
&= \log \int \sum_{\boldsymbol{z}}
p(\boldsymbol{w}, \boldsymbol{z}, \boldsymbol{\phi}, \boldsymbol{\theta} | \boldsymbol{\alpha}, \boldsymbol{\beta})
d\boldsymbol{\phi} d\boldsymbol{\theta}
\\
&= \log \int \sum_{\boldsymbol{z}}
q(\boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{\phi} | \boldsymbol{\xi}^{\theta}, \boldsymbol{\xi}^{\phi})
\frac{
p(\boldsymbol{w}, \boldsymbol{z}, \boldsymbol{\phi}, \boldsymbol{\theta} | \boldsymbol{\alpha}, \boldsymbol{\beta})
}{
q(\boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{\phi} | \boldsymbol{\xi}^{\theta}, \boldsymbol{\xi}^{\phi})
}
d\boldsymbol{\phi} d\boldsymbol{\theta}
\\
&\geq
\int \sum_{\boldsymbol{z}}
q(\boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{\phi} | \boldsymbol{\xi}^{\theta}, \boldsymbol{\xi}^{\phi})
\log \frac{
p(\boldsymbol{w}, \boldsymbol{z}, \boldsymbol{\phi}, \boldsymbol{\theta} | \boldsymbol{\alpha}, \boldsymbol{\beta})
}{
q(\boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{\phi} | \boldsymbol{\xi}^{\theta}, \boldsymbol{\xi}^{\phi})
}
d\boldsymbol{\phi} d\boldsymbol{\theta}
\equiv F[q(\boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{\phi} | \boldsymbol{\xi}^{\theta}, \boldsymbol{\xi}^{\phi})]
\end{aligned}
$$
ここで、近似事後分布は
$$
\begin{aligned}
q(\boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{\phi} | \boldsymbol{\xi}^{\theta}, \boldsymbol{\xi}^{\phi})
&= q(\boldsymbol{z})
q(\boldsymbol{\theta} | \boldsymbol{\xi}^{\theta})
q(\boldsymbol{\phi} | \boldsymbol{\xi}^{\phi})
\\
&= \left[ \prod_{d=1}^M \prod_{i=1}^{n_d}
q(z_{d,i})
\right]
\left[ \prod_{d=1}^M
q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})
\right]
\left[ \prod_{k=1}^K
q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})
\right]
\end{aligned}
$$
と因子分解できると仮定する。
また、結合分布は生成過程より、ベイズの定理を用いて
$$
\begin{aligned}
p(\boldsymbol{w}, \boldsymbol{z}, \boldsymbol{\phi}, \boldsymbol{\theta} | \boldsymbol{\alpha}, \boldsymbol{\beta})
&= p(\boldsymbol{w}, \boldsymbol{z} | \boldsymbol{\phi}, \boldsymbol{\theta})
p(\boldsymbol{\phi} | \boldsymbol{\beta})
p(\boldsymbol{\theta} | \boldsymbol{\alpha})
\\
&= p(\boldsymbol{w} | \boldsymbol{z}, \boldsymbol{\phi})
p(\boldsymbol{z} | \boldsymbol{\theta})
p(\boldsymbol{\phi} | \boldsymbol{\beta})
p(\boldsymbol{\theta} | \boldsymbol{\alpha})
\\
&= \left[ \prod_{d=1}^M \prod_{i=1}^{n_d}
p(w_{d,i} | \boldsymbol{\phi}_{z_{d,i}})
p(z_{d,i} | \boldsymbol{\theta}_d)
\right]
\left[ \prod_{k=1}^K
p(\boldsymbol{\phi}_k | \boldsymbol{\beta})
\right]
\left[ \prod_{d=1}^M
p(\boldsymbol{\theta}_d | \boldsymbol{\alpha})
\right]
\end{aligned}
$$
となる。
従って、変分下限$F[q(\boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{\phi})]$は
$$
\begin{align}
F[q(\boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{\phi} | \boldsymbol{\xi}^{\theta}, \boldsymbol{\xi}^{\phi})]
&= \int \sum_{\boldsymbol{z}}
q(\boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{\phi} | \boldsymbol{\xi}^{\theta}, \boldsymbol{\xi}^{\phi})
\log \frac{
p(\boldsymbol{w}, \boldsymbol{z}, \boldsymbol{\phi}, \boldsymbol{\theta} | \boldsymbol{\alpha}, \boldsymbol{\beta})
}{
q(\boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{\phi} | \boldsymbol{\xi}^{\theta}, \boldsymbol{\xi}^{\phi})
}
d\boldsymbol{\phi} d\boldsymbol{\theta}
\\
&= \int \sum_{\boldsymbol{z}}
q(\boldsymbol{z})
q(\boldsymbol{\theta} | \boldsymbol{\xi}^{\theta})
q(\boldsymbol{\phi} | \boldsymbol{\xi}^{\phi})
\log \frac{
p(\boldsymbol{w} | \boldsymbol{z}, \boldsymbol{\phi})
p(\boldsymbol{z} | \boldsymbol{\theta})
p(\boldsymbol{\phi} | \boldsymbol{\beta})
p(\boldsymbol{\theta} | \boldsymbol{\alpha})
}{
q(\boldsymbol{z})
q(\boldsymbol{\theta} | \boldsymbol{\xi}^{\theta})
q(\boldsymbol{\phi} | \boldsymbol{\xi}^{\phi})
}
d\boldsymbol{\phi} d\boldsymbol{\theta}
\\
&= \int \sum_{\boldsymbol{z}}
q(\boldsymbol{z})
q(\boldsymbol{\theta} | \boldsymbol{\xi}^{\theta})
q(\boldsymbol{\phi} | \boldsymbol{\xi}^{\phi}) \left(
\log \Bigl(
p(\boldsymbol{w} | \boldsymbol{z}, \boldsymbol{\phi})
p(\boldsymbol{z} | \boldsymbol{\theta})
\Bigr)
- \log q(\boldsymbol{z})
+ \log \frac{
p(\boldsymbol{\theta} | \boldsymbol{\alpha})
}{
q(\boldsymbol{\theta} | \boldsymbol{\xi}^{\theta})
}
+ \log \frac{
p(\boldsymbol{\phi} | \boldsymbol{\beta})
}{
q(\boldsymbol{\phi} | \boldsymbol{\xi}^{\phi})
}
\right)
d\boldsymbol{\phi} d\boldsymbol{\theta}
\\
&= \int \sum_{\boldsymbol{z}}
q(\boldsymbol{z})
q(\boldsymbol{\theta} | \boldsymbol{\xi}^{\theta})
q(\boldsymbol{\phi} | \boldsymbol{\xi}^{\phi})
\log \Bigl(
p(\boldsymbol{w} | \boldsymbol{z}, \boldsymbol{\phi})
p(\boldsymbol{z} | \boldsymbol{\theta})
\Bigr)
d\boldsymbol{\phi} d\boldsymbol{\theta}
- \sum_{\boldsymbol{z}}
q(\boldsymbol{z})
\log q(\boldsymbol{z}) \\
&\qquad
+ \int
q(\boldsymbol{\theta} | \boldsymbol{\xi}^{\theta})
\log \frac{
p(\boldsymbol{\theta} | \boldsymbol{\alpha})
}{
q(\boldsymbol{\theta} | \boldsymbol{\xi}^{\theta})
}
d\boldsymbol{\theta}
+ \int
q(\boldsymbol{\phi} | \boldsymbol{\xi}^{\phi})
\log \frac{
p(\boldsymbol{\phi} | \boldsymbol{\beta})
}{
q(\boldsymbol{\phi} | \boldsymbol{\xi}^{\phi})
}
d\boldsymbol{\phi}
\\
&= \int \sum_{d=1}^M \sum_{i=1}^{n_d} \sum_{k=1}^K
q(z_{d,i} = k)
q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})
q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})
\log \Bigl(
p(w_{d,i} | \boldsymbol{\phi}_{z_{d,i}})
p(z_{d,i} = k | \boldsymbol{\theta}_d)
\Bigr)
d\boldsymbol{\phi}_k d\boldsymbol{\theta}_d
\tag{a}\\
&\qquad
- \sum_{d=1}^M \sum_{i=1}^{n_d} \sum_{k=1}^K
q(z_{d,i} = k)
\log q(z_{d,i} = k) \\
&\qquad
+ \sum_{d=1}^M \int
q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})
\log \frac{
p(\boldsymbol{\theta}_d | \boldsymbol{\alpha})
}{
q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})
}
d\boldsymbol{\theta}_d
+ \sum_{k=1}^K \int
q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})
\log \frac{
p(\boldsymbol{\phi}_k | \boldsymbol{\beta})
}{
q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})
}
d\boldsymbol{\phi}_k
\tag{b}
\end{align}
$$
となる。式(a)は更に
$$
\begin{align}
&\int \sum_{d=1}^M \sum_{i=1}^{n_d} \sum_{k=1}^K
q(z_{d,i} = k)
q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})
q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})
\log \Bigl(
p(w_{d,i} | \boldsymbol{\phi}_{z_{d,i}})
p(z_{d,i} = k | \boldsymbol{\theta}_d)
\Bigr)
d\boldsymbol{\phi}_k d\boldsymbol{\theta}_d
\tag{a}\\
&= \int \sum_{d=1}^M \sum_{i=1}^{n_d} \sum_{k=1}^K
q(z_{d,i} = k)
q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})
q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})
\log \Bigl(
\prod_{v=1}^V
p(w_{d,i} | \phi_{k,v})^{\delta(z_{d,i}=k)}
p(z_{d,i} = k | \theta_{d,k})
\Bigr)
d\boldsymbol{\phi}_k d\boldsymbol{\theta}_d \\
\\
&= \int \sum_{d=1}^M \sum_{i=1}^{n_d} \sum_{k=1}^K \sum_{v=1}^V
q(z_{d,i} = k)
q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})
q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})
\log \Bigl(
\phi_{k,v}^{\delta(z_{d,i}=k) \delta(w_{d,i}=v)}
\theta_{d,k}^{\delta(z_{d,i}=k)}
\Bigr)
d\boldsymbol{\phi}_k d\boldsymbol{\theta}_d \\
\\
&= \sum_{d=1}^M \sum_{i=1}^{n_d} \sum_{k=1}^K \sum_{v=1}^V
q(z_{d,i} = k)
\delta(z_{d,i} = k)
\delta(w_{d,i} = v)
\int
q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})
\log \phi_{k,v}
d\boldsymbol{\phi}_k \\
&\qquad
+ \sum_{d=1}^M \sum_{i=1}^{n_d} \sum_{k=1}^K
q(z_{d,i} = k)
\delta(z_{d,i} = k)
\int
q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})
\log \theta_{d,k}
d\boldsymbol{\theta}_d \\
\\
&= \sum_{k=1}^K \sum_{v=1}^V
\mathbb{E}_{q(\boldsymbol{z})}[
n_{k,v}
]
\mathbb{E}_{q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})}[
\log \phi_{k,v}
]
+ \sum_{d=1}^M \sum_{k=1}^K
\mathbb{E}_{q(\boldsymbol{z}_d)}[
n_{d,k}
]
\mathbb{E}_{q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})}[
\log \theta_{d,k}
]
\end{align}
$$
となる。また、式(b)をKL情報量に置き換える。
$$
\begin{align}
&\sum_{d=1}^M \int
q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})
\log \frac{
p(\boldsymbol{\theta}_d | \boldsymbol{\alpha})
}{
q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})
}
d\boldsymbol{\theta}_d
+ \sum_{k=1}^K \int
q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})
\log \frac{
p(\boldsymbol{\phi}_k | \boldsymbol{\beta})
}{
q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})
}
\boldsymbol{\phi}_k
\tag{b}\\
&= \sum_{d=1}^M \int
q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})
\log \Bigl(
p(\boldsymbol{\theta}_d | \boldsymbol{\alpha})
- q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})
\Bigr)
d\boldsymbol{\theta}_d
+ \sum_{k=1}^K \int
q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})
\log \Bigl(
p(\boldsymbol{\phi}_k | \boldsymbol{\beta})
- q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})
\Bigr)
d\boldsymbol{\phi}_k
\\
&= - \sum_{d=1}^M \int
q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})
\log \Bigl(
- p(\boldsymbol{\theta}_d | \boldsymbol{\alpha})
+ q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})
\Bigr)
d\boldsymbol{\theta}_d
- \sum_{k=1}^K \int
q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})
\log \Bigl(
- p(\boldsymbol{\phi}_k | \boldsymbol{\beta})
+ q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})
\Bigr)
d\boldsymbol{\phi}_k
\\
&= - \sum_{d=1}^M \int
q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})
\log \frac{
q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})
}{
p(\boldsymbol{\theta}_d | \boldsymbol{\alpha})
}
d\boldsymbol{\theta}_d
- \sum_{k=1}^K \int
q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})
\log \frac{
q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})
}{
p(\boldsymbol{\phi}_k | \boldsymbol{\beta})
}
d\boldsymbol{\phi}_k
\\
&= - \sum_{d=1}^M
{\rm KL}[
q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})
\parallel
p(\boldsymbol{\theta}_d | \boldsymbol{\alpha})
]
- \sum_{k=1}^K
\mathbb{E}[
q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})
\parallel
p(\boldsymbol{\phi}_k | \boldsymbol{\beta})
]
\end{align}
$$
更に、式(3.81)を用いてKL情報量を置き換える。
$$
\begin{aligned}
&- \sum_{d=1}^M
{\rm KL}[
q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})
\parallel
p(\boldsymbol{\theta}_d | \boldsymbol{\alpha})
]
- \sum_{k=1}^K
\mathbb{E}[
q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})
\parallel
p(\boldsymbol{\phi}_k | \boldsymbol{\beta})
]
\\
&= - \sum_{d=1}^M \left[
\log \frac{
\Gamma(\sum_{k=1}^K \xi_{d,k}^{\theta})
}{
\prod_{k=1}^K \Gamma(\xi_{d,k}^{\theta})
}
- \log \frac{
\Gamma(\sum_{k=1}^K \alpha_k)
}{
\prod_{k=1}^K \Gamma(\alpha_k)
}
\right]
- \sum_{d=1}^M \sum_{k=1}^K
(\xi_{d,k}^{\theta} - \alpha_k)
\mathbb{E}_{q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})}[
\log \theta_{d,k}
] \\
&\qquad
- \sum_{k=1}^K \left[
\log \frac{
\Gamma(\sum_{v=1}^V \xi_{k,v}^{\phi})
}{
\prod_{v=1}^V \Gamma(\xi_{k,v}^{\phi})
}
- \log \frac{
\Gamma(\sum_{v=1}^V \beta_v)
}{
\prod_{v=1}^V \Gamma(\beta_v)
}
\right]
- \sum_{k=1}^K \sum_{v=1}^V
(\xi_{k,v}^{\phi} - \beta_v)
\mathbb{E}_{q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})}[
\log \phi_{k,v}
]
\\
&= \sum_{d=1}^M \left[
\log \frac{
\Gamma(\sum_{k=1}^K \alpha_k)
}{
\prod_{k=1}^K \Gamma(\alpha_k)
}
- \log \frac{
\Gamma(\sum_{k=1}^K \xi_{d,k}^{\theta})
}{
\prod_{k=1}^K \Gamma(\xi_{d,k}^{\theta})
}
\right]
+ \sum_{d=1}^M \sum_{k=1}^K
(\alpha_k - \xi_{d,k}^{\theta})
\mathbb{E}_{q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})}[
\log \theta_{d,k}
] \\
&\qquad
+ \sum_{k=1}^K \left[
\log \frac{
\Gamma(\sum_{v=1}^V \beta_v)
}{
\prod_{v=1}^V \Gamma(\beta_v)
}
- \log \frac{
\Gamma(\sum_{v=1}^V \xi_{k,v}^{\phi})
}{
\prod_{v=1}^V \Gamma(\xi_{k,v}^{\phi})
}
\right]
+ \sum_{k=1}^K \sum_{v=1}^V
(\beta_v - \xi_{k,v}^{\phi})
\mathbb{E}_{q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})}[
\log \phi_{k,v}
]
\end{aligned}
$$
従って、変分下限は
$$
\begin{align}
F[q(\boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{\phi} | \boldsymbol{\xi}^{\theta}, \boldsymbol{\xi}^{\phi})]
&= \sum_{k=1}^K \sum_{v=1}^V
\mathbb{E}_{q(\boldsymbol{z})}[
n_{k,v}
]
\mathbb{E}_{q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})}[
\log \phi_{k,v}
]
+ \sum_{d=1}^M \sum_{k=1}^K
\mathbb{E}_{q(\boldsymbol{z}_d)}[
n_{d,k}
]
\mathbb{E}_{q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})}[
\log \theta_{d,k}
] \\
&\qquad
- \sum_{d=1}^M \sum_{i=1}^{n_d} \sum_{k=1}^K
q(z_{d,i} = k)
\log q(z_{d,i} = k) \\
&\qquad
+ \sum_{d=1}^M \left[
\log \frac{
\Gamma(\sum_{k=1}^K \alpha_k)
}{
\prod_{k=1}^K \Gamma(\alpha_k)
}
- \log \frac{
\Gamma(\sum_{k=1}^K \xi_{d,k}^{\theta})
}{
\prod_{k=1}^K \Gamma(\xi_{d,k}^{\theta})
}
\right]
+ \sum_{d=1}^M \sum_{k=1}^K
(\alpha_k - \xi_{d,k}^{\theta})
\mathbb{E}_{q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})}[
\log \theta_{d,k}
] \\
&\qquad
+ \sum_{k=1}^K \left[
\log \frac{
\Gamma(\sum_{v=1}^V \beta_v)
}{
\prod_{v=1}^V \Gamma(\beta_v)
}
- \log \frac{
\Gamma(\sum_{v=1}^V \xi_{k,v}^{\phi})
}{
\prod_{v=1}^V \Gamma(\xi_{k,v}^{\phi})
}
\right]
+ \sum_{k=1}^K \sum_{v=1}^V
(\beta_v - \xi_{k,v}^{\phi})
\mathbb{E}_{q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})}[
\log \phi_{k,v}
]
\\
&= \sum_{k=1}^K \left[
\log \frac{
\Gamma(\sum_{v=1}^V \beta_v)
}{
\prod_{v=1}^V \Gamma(\beta_v)
}
- \log \frac{
\Gamma(\sum_{v=1}^V \xi_{k,v}^{\phi})
}{
\prod_{v=1}^V \Gamma(\xi_{k,v}^{\phi})
}
\right]
+ \sum_{k=1}^K \sum_{v=1}^V (
\mathbb{E}_{q(\boldsymbol{z})}[
n_{k,v}
]
+ \beta_v - \xi_{k,v}^{\phi}
)
\mathbb{E}_{q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})}[
\log \phi_{k,v}
] \\
&\qquad
+ \sum_{d=1}^M \left[
\log \frac{
\Gamma(\sum_{k=1}^K \alpha_k)
}{
\prod_{k=1}^K \Gamma(\alpha_k)
}
- \log \frac{
\Gamma(\sum_{k=1}^K \xi_{d,k}^{\theta})
}{
\prod_{k=1}^K \Gamma(\xi_{d,k}^{\theta})
}
\right]
+ \sum_{d=1}^M \sum_{k=1}^K (
\mathbb{E}_{q(\boldsymbol{z}_d)}[
n_{d,k}
]
+ \alpha_k - \xi_{d,k}^{\theta}
)
\mathbb{E}_{q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})}[
\log \theta_{d,k}
] \\
&\qquad
- \sum_{d=1}^M \sum_{i=1}^{n_d} \sum_{k=1}^K
q(z_{d,i} = k)
\log q(z_{d,i} = k)
\tag{3.102}
\end{align}
$$
になる。
・トピック分布の近似事後分布のパラメータの導出
変分下限を$\xi_{d,k}^{\theta}$で微分するには
$$
\begin{aligned}
\frac{
\partial F[q(\boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{\phi} | \boldsymbol{\xi}^{\theta}, \boldsymbol{\xi}^{\phi})]
}{
\partial \xi_{d,k}^{\theta}
}
&= \frac{\partial}{\partial \xi_{d,k}^{\theta}} \left(
- \log \frac{
\Gamma(\sum_{k=1}^K \xi_{d,k}^{\theta})
}{
\prod_{k=1}^K \Gamma(\xi_{d,k}^{\theta})
}
\right)
+ \frac{\partial}{\partial \xi_{d,k}^{\theta}} \left(
- \xi_{d,k}^{\theta}
\mathbb{E}_{q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})}[
\log \theta_{d,k}
]
\right) \\
&\qquad
+ \sum_{k'=1}^K (
\mathbb{E}_{q(\boldsymbol{z}_d)}[
n_{d,k'}
]
+ \alpha_{k'} - \xi_{d,k'}^{\theta}
)
\frac{\partial}{\partial \xi_{d,k}^{\theta}} \left(
\mathbb{E}_{q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})}[
\log \theta_{d,k'}
]
\right)
\end{aligned}
$$
の3つの項の微分をすればよい。
1つ目の項は、プサイ関数への置き換え(あるいは式(3.76)から式(3.77)への式変形)とDirichlet分布の期待値計算(3.74)を用いて
$$
\begin{aligned}
\frac{\partial}{\partial \xi_{d,k}^{\theta}} \left(
- \log \frac{
\Gamma(\sum_{k=1}^K \xi_{d,k}^{\theta})
}{
\prod_{k=1}^K \Gamma(\xi_{d,k}^{\theta})
}
\right)
&= \frac{\partial}{\partial \xi_{d,k}^{\theta}} \left(
\sum_{k=1}^K
\log \Gamma(\xi_{d,k}^{\theta})
- \log \Gamma \Bigl(\sum_{k=1}^K \xi_{d,k}^{\theta} \Bigr)
\right)
\\
&= \Psi(\xi_{d,k}^{\theta})
- \Psi \left(\sum_{k=1}^K \xi_{d,k}^{\theta} \right)
\\
&= \mathbb{E}_{q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})}[
\log \theta_{d,k}
]
\end{aligned}
$$
となる。2つ目の項は
$$
\begin{aligned}
\frac{\partial}{\partial \xi_{d,k}^{\theta}} \left(
- \xi_{d,k}^{\theta}
\mathbb{E}_{q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})}[
\log \theta_{d,k}
]
\right)
&= - \mathbb{E}_{q(\boldsymbol{\theta}_d | \boldsymbol{\xi}_d^{\theta})}[
\log \theta_{d,k}
]
\end{aligned}
$$
となるので、1つ目の項と合わせて消えてしまう。よって、3つ目の項が0となる$\xi_{d,k}^{\theta}$を求めればよいことが分かる。
$$
\begin{aligned}
\mathbb{E}_{q(\boldsymbol{z}_d)}[
n_{d,k}
]
+ \alpha_{k} - \xi_{d,k}^{\theta}
&= 0
\\
\xi_{d,k}^{\theta}
&= \mathbb{E}_{q(\boldsymbol{z}_d)}[
n_{d,k}
]
+ \alpha_{k}
\end{aligned}
$$
これは式(3.89)と等しくなることが確認できる。
・単語分布の近似事後分布のパラメータの導出
トピック分布と同様に、変分下限を$\xi_{d,k}^{\phi}$で微分するには
$$
\begin{aligned}
\frac{
\partial F[q(\boldsymbol{z}, \boldsymbol{\theta}, \boldsymbol{\phi} | \boldsymbol{\xi}^{\theta}, \boldsymbol{\xi}^{\phi})]
}{
\partial \xi_{k,v}^{\phi}
}
&= \frac{\partial}{\partial \xi_{k,v}^{\phi}} \left(
- \log \frac{
\Gamma(\sum_{v=1}^V \xi_{k,v}^{\phi})
}{
\prod_{v=1}^V \Gamma(\xi_{k,v}^{\phi})
}
\right)
+ \frac{\partial}{\partial \xi_{k,v}^{\phi}} \left(
- \xi_{k,v}^{\phi}
\mathbb{E}_{q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})}[
\log \phi_{k,v}
]
\right) \\
&\qquad
+ \sum_{v'=1}^V (
\mathbb{E}_{q(\boldsymbol{z})}[
n_{k,v'}
]
+ \beta_{v'} - \xi_{k,v'}^{\phi}
)
\frac{\partial}{\partial \xi_{k,v}^{\phi}} \left(
\mathbb{E}_{q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})}[
\log \phi_{k,v'}
]
\right)
\end{aligned}
$$
の3つの項の微分をすればよい。
1つ目の項は、プサイ関数への置き換え(あるいは式(3.76)から式(3.77)への式変形)とDirichlet分布の期待値計算(3.74)を用いて
$$
\begin{align}
\frac{\partial}{\partial \xi_{k,v}^{\phi}} \left(
- \log \frac{
\Gamma(\sum_{v=1}^V \xi_{k,v}^{\phi})
}{
\prod_{v=1}^V \Gamma(\xi_{k,v}^{\phi})
}
\right)
&= \frac{\partial}{\partial \xi_{k,v}^{\phi}} \left(
\sum_{v=1}^V
\log \Gamma(\xi_{k,v}^{\phi})
- \log \Gamma \Bigl(\sum_{v=1}^V \xi_{k,v}^{\phi} \Bigr)
\right)
\\
&= \Psi(\xi_{k,v}^{\phi})
- \Psi \left(\sum_{v=1}^V \xi_{k,v}^{\phi} \right)
\\
&= \mathbb{E}_{q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})} [
\log \phi_{k,v}
]
\tag{3.103}
\end{align}
$$
となる。2つ目の項は
$$
\begin{aligned}
\frac{\partial}{\partial \xi_{k,v}^{\phi}} \left(
- \xi_{k,v}^{\phi}
\mathbb{E}_{q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})}[
\log \phi_{k,v}
]
\right)
&= - \mathbb{E}_{q(\boldsymbol{\phi}_k | \boldsymbol{\xi}_k^{\phi})}[
\log \phi_{k,v}
]
\end{aligned}
$$
となるので、1つ目の項と合わせて消えてしまう。よって、3つ目の項が0となる$\xi_{k,v}^{\phi}$を求めればよいことが分かる。
$$
\begin{aligned}
\mathbb{E}_{q(\boldsymbol{z})}[
n_{k,v}
]
+ \beta_{v} - \xi_{k,v}^{\phi}
&= 0
\\
\xi_{k,v}^{\phi}
&= \mathbb{E}_{q(\boldsymbol{z})}[
n_{k,v}
]
+ \beta_{v}
\end{aligned}
$$
これは式(3.95)と等しくなることが確認できる。
参考文献
- 佐藤一誠『トピックモデルによる統計的潜在意味解析』(自然言語処理シリーズ 8)奥村学監修,コロナ社,2015年.
おわりに
早くリハビリを終えて新しいとろこに挑戦したくなってきた!
【次節の内容】
www.anarchive-beta.com