Reputation: 175
A variational autoencoder optimizes the ELBo objective:
In no way is ELBo based on a joint or factorized probability distribution of two variables x and y. Therefore, the encoder of a conditional VAE seems to be able to ignore y (the condition) entirely, and merely retain the information required for reconstructing x (the image). Similarly, the decoder seems to be able to ignore y and merely extract the information from the latent sample required for reconstructing x. Given a setup such as the picture below (courtesy of Montserrat, Bustamante & Ioannidis), why is this not a more frequently occurring problem? Is it because the condition may provide a significant amount of representation in a simple form?
Upvotes: 1
Views: 159
Reputation: 1
Therefore, the encoder of a conditional VAE seems to be able to ignore y (the condition) entirely,
My intuition might be wrong, but how about remove the condition into the decoder. Feel like it will force the encoder to compress the condition into z
.
the decoder seems to be able to ignore y and merely extract the information from the latent sample required for reconstructing x
Does it relates to the posterior collapse? I see paper also uses the term: representation collapse. I don't see a general solution if that happens.
Upvotes: 0