Some finding from Discovering Influential Factors in VAEs paper.
We believe that data can be generated by a number of independent generators. In the work of discovering the influential generation factors of variational autoencoder, we found that the KL divergence term optimized by variational autoencoder promotes the independence of factors and promotes the sparsity of factor mutual information. The influential generative factors extracted by variational autoencoder can capture the variation of data well.
If we have $$p(z_1,\cdots,z_h|x)=p(z_1|x)\cdots p(z_h|x)$$ and $$p(z_1,\cdots,z_h)=p(z_1,\cdots,z_h),$$ according to the theorem in paper, we have $$I(z_1,\cdots,z_h;x)=I(z_1;x)+\cdots+I(z_h;x).$$ Therefore the second objective in VAE is $$E_{x}D_{KL}(q(\mathbf{z}|x)||p(\mathbf{z}))=\sum_{i=1}^{h} I(z_i;x) +D_{KL}(q(z_i)||p(z_i)).$$ Due to the separable nature of its mutual information, this causes the posterior collapse and allows us to use a small number of factor dimensions to achieve a better effect on subsequent classification or reconstruction tasks.
According to the properties of mutual information lasso, it yields