CycleGAN for unpaired image to image translation

Question

Referring to the original paper on CycleGAN i am confused about this line

The optimal G thereby translates the domain X to a domain Yˆ distributed identically to Y . However, such a translation does not guarantee that an individual input x and output y are paired up in a meaningful way – there are infinitely many mappings G that will induce the same distribution over yˆ.

I understand there are two sets of images and there is no pairing between them so when generator will taken one image lets say x from set X as input and try to translate it to an image similar to the images in Y set then my question is that there are many images present in the set Y so which y will our x be translated into? There are so many options available in set Y. Is that what is pointed out in these lines of the paper that i have written above? And is this the reason we take cyclic loss to overcome this problem and to create some type of pairing between any two random images by converting x to y and then converting y back to x?

youngpanda · Accepted Answer

The image x won't be translated to a concrete image y but rather to a "style" of the domain Y. The input is fed to the generator, which tries to produce a sample from the desired distribution (the other domain), the generated image then goes to the discriminator, which tries to predict if the sample is from the actual distribution or produced by the generator. This is just the normal GAN workflow.

If I understand it correctly, in the lines you quoted, authors explain the problems that arise with adversarial loss. They say it again here:

Adversarial training can, in theory, learn mappings G and F that produce outputs identically distributed as target domains Y and X respectively. However, with large enough capacity, a network can map the same set of input images to any random permutation of images in the target domain, where any of the learned mappings can induce an output distribution that matches the target distribution. Thus, an adversarial loss alone cannot guarantee that the learned function can map an individual input x_i to a desired output y_i.

This is one of the reasons for introducing the concept of cycle-consistency to produce meaningful mappings, reduce the space of possible mapping functions (can be viewed as a form of regularization). The idea is not to create a pairing between 2 random images which already are in the dataset (the dataset stays unpaired), but to make sure, that if you map a real image from the domain X to the domain Y and then back again, you get the original image back.

Cycle consistency encourages generators to avoid unnecessary changes and thus to generate images that share structural similarity with inputs, it also prevents generators from excessive hallucinations and mode collapse.

I hope that answers your questions.

CycleGAN for unpaired image to image translation

Answers (1)

Related Questions