Why does a conditional VAE not ignore the condition?

Question

A variational autoencoder optimizes the ELBo objective:

In no way is ELBo based on a joint or factorized probability distribution of two variables x and y. Therefore, the encoder of a conditional VAE seems to be able to ignore y (the condition) entirely, and merely retain the information required for reconstructing x (the image). Similarly, the decoder seems to be able to ignore y and merely extract the information from the latent sample required for reconstructing x. Given a setup such as the picture below (courtesy of Montserrat, Bustamante & Ioannidis), why is this not a more frequently occurring problem? Is it because the condition may provide a significant amount of representation in a simple form?

Why does a conditional VAE not ignore the condition?

Answers (1)

Related Questions