Reputation: 43
I want to fine tune some generative diffusion model (DDPM), lets say trained on ImageNet (NOT Stable Diffusion which is text2img), to some other data like CelebA or CIFAR-10. I wonder two things:
So far I found some models from OpenAI trained on ImageNet in both ways but haven't tried to that yet. Some theoretical input would be greatly appreciated
Upvotes: 0
Views: 661
Reputation: 31
To answer your first question, Training as conditional or unconditional depends on your choice. if the task you want to achieve is a generation specific to classes use conditional if the task is to just fill up the area without any prompt use unconditional.
unconditional pre-trained models can be fine-tuned to work as a conditional one either using a classifier or a classifier free model. I'm not too sure about using conditional model and fine-tuning it to an unconditional but it should mostly be possible.
For your second question, Convolutions can handle data with varying sizes except for depth/no:of channels. but when trained on a smaller size and being used for images of bigger size the models might lack performance as the detail capture for large images is not efficient. Both convolutions and attention blocks used in diffusion models are size agnostic.
But the pre-trained weights provided for DDPM in this link has only unconditional for 256x256 images all the others are conditional so check that out too.
Upvotes: 1