Reputation: 21
I'm trying to do inpainting with a subject that a lora has been trained on. Since the base model is not doing well with inpainting I tried to load a controlnet that is used for inpainting.
Conrolnet inpainting: lllyasviel/control_v11p_sd15_inpaint
Basemodel: runwayml/stable-diffusion-v1-5
Lora: self fine-tuned lora on the basemodel
Another option would be to train the lora on runwayml/stable-diffusion-inpainting, but the dimensions doesn't seem to be right. The tools that are available to fine-tune a lora give an error about dimensions.
The following code loads a lora trained on runwayml/stable-diffusion-v1-5. Given the "sks chair" as trigger gives the result as expected. (this is not using inpaint controlnet). Proving the lora has learned the subject.
from diffusers import StableDiffusionPipeline, ControlNetModel, DDIMScheduler
from diffusers.utils import load_image
import numpy as np
import torch
from PIL import Image
generator = torch.Generator(device="cpu").manual_seed(2)
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16
)
pipe.load_lora_weights('/content/drive/MyDrive/Loras', weight_name="fine_tuned_lora.safetensors")
_ = pipe.to("cuda")
# generate image
image = pipe(
"sks chair",
guidance_scale=8,
num_inference_steps=40,
generator=generator,
cross_attention_kwargs={"scale": 0.99},
eta=1.0,
).images[0]
image
Correct result: Result
The following code includes the inpaint controlnet:
from diffusers import StableDiffusionPipeline, ControlNetModel, DDIMScheduler, StableDiffusionControlNetInpaintPipeline
from diffusers.utils import load_image
import numpy as np
import torch
from PIL import Image
init_image = Image.open("empty_room_image.jpg").convert("RGB")
mask_image = Image.open("empty_room_image_mask.jpg").convert("RGB")
init_image = init_image.resize((512, 512))
mask_image = mask_image.resize((512, 512))
def make_inpaint_condition(image, image_mask):
image = np.array(image.convert("RGB")).astype(np.float32) / 255.0
image_mask = np.array(image_mask.convert("L")).astype(np.float32) / 255.0
assert image.shape[0:1] == image_mask.shape[0:1], "image and image_mask must have the same image size"
image[image_mask > 0.5] = -1.0 # set as masked pixel
image = np.expand_dims(image, 0).transpose(0, 3, 1, 2)
image = torch.from_numpy(image)
return image
control_image = make_inpaint_condition(init_image, mask_image)
controlnet = ControlNetModel.from_pretrained(
"lllyasviel/control_v11p_sd15_inpaint", torch_dtype=torch.float16
)
pipe = StableDiffusionControlNetInpaintPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16
)
pipe.load_lora_weights('/content/drive/MyDrive/Loras', weight_name="fine_tuned_lora.safetensors")
_ = pipe.to("cuda")
image = pipe(
prompt="sks chair",
num_inference_steps=20,
guidance_scale=8, #around 8 looks good
controlnet_conditioning_scale=0.9,
control_guidance_end=0.9,
cross_attention_kwargs={"scale": .96},
generator=generator,
eta=1.0,
image=init_image,
mask_image=mask_image,
control_image=control_image,
).images[0]
image
Result: Result
For some reason the lora is not triggered, and no meaningful results. Anyone ideas to improve?
I also tried with SDXL with the standard inpainting given a mask. Simulair results.
Upvotes: 2
Views: 680