Stable Diffusion 1.5 Inpaint + Lora Fine-Tuning (result not as expected)

Question

I'm trying to do inpainting with a subject that a lora has been trained on. Since the base model is not doing well with inpainting I tried to load a controlnet that is used for inpainting.

Conrolnet inpainting: lllyasviel/control_v11p_sd15_inpaint

Basemodel: runwayml/stable-diffusion-v1-5

Lora: self fine-tuned lora on the basemodel

Another option would be to train the lora on runwayml/stable-diffusion-inpainting, but the dimensions doesn't seem to be right. The tools that are available to fine-tune a lora give an error about dimensions.

The following code loads a lora trained on runwayml/stable-diffusion-v1-5. Given the "sks chair" as trigger gives the result as expected. (this is not using inpaint controlnet). Proving the lora has learned the subject.

from diffusers import StableDiffusionPipeline, ControlNetModel, DDIMScheduler
from diffusers.utils import load_image
import numpy as np
import torch
from PIL import Image

generator = torch.Generator(device="cpu").manual_seed(2)

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16
)
pipe.load_lora_weights('/content/drive/MyDrive/Loras', weight_name="fine_tuned_lora.safetensors")
_ = pipe.to("cuda")


# generate image
image = pipe(
    "sks chair",
    guidance_scale=8,
    num_inference_steps=40,
    generator=generator,
    cross_attention_kwargs={"scale": 0.99},
    eta=1.0,
).images[0]

image

Correct result: Result

The following code includes the inpaint controlnet:

from diffusers import StableDiffusionPipeline, ControlNetModel, DDIMScheduler, StableDiffusionControlNetInpaintPipeline
from diffusers.utils import load_image
import numpy as np
import torch
from PIL import Image

init_image = Image.open("empty_room_image.jpg").convert("RGB")
mask_image = Image.open("empty_room_image_mask.jpg").convert("RGB")

init_image = init_image.resize((512, 512))
mask_image = mask_image.resize((512, 512))

def make_inpaint_condition(image, image_mask):
    image = np.array(image.convert("RGB")).astype(np.float32) / 255.0
    image_mask = np.array(image_mask.convert("L")).astype(np.float32) / 255.0

    assert image.shape[0:1] == image_mask.shape[0:1], "image and image_mask must have the same image size"
    image[image_mask > 0.5] = -1.0  # set as masked pixel
    image = np.expand_dims(image, 0).transpose(0, 3, 1, 2)
    image = torch.from_numpy(image)
    return image


control_image = make_inpaint_condition(init_image, mask_image)

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_inpaint", torch_dtype=torch.float16
)

pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16
)
pipe.load_lora_weights('/content/drive/MyDrive/Loras', weight_name="fine_tuned_lora.safetensors")
_ = pipe.to("cuda")

image = pipe(
      prompt="sks chair",
      num_inference_steps=20,
      guidance_scale=8, #around 8 looks good
      controlnet_conditioning_scale=0.9,
      control_guidance_end=0.9,
      cross_attention_kwargs={"scale": .96},
      generator=generator,
      eta=1.0,
      image=init_image,
      mask_image=mask_image,
      control_image=control_image,
  ).images[0]
image

Result: Result

For some reason the lora is not triggered, and no meaningful results. Anyone ideas to improve?

I also tried with SDXL with the standard inpainting given a mask. Simulair results.

Stable Diffusion 1.5 Inpaint + Lora Fine-Tuning (result not as expected)

Answers (0)

Related Questions