Yao Liu
Yao Liu

Reputation: 1

RuntimeError: Input type (c10::BFloat16) and bias type (c10::Half) should be the same

vl_chat_processor: VLChatProcessor = VLChatProcessor.from_pretrained(model_path)
tokenizer = vl_chat_processor.tokenizer

vl_gpt: MultiModalityCausalLM = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
vl_gpt = vl_gpt.to(torch.float16).cuda().eval()

conversation = [
    {
        "role": "User",
        "content": "<image_placeholder>Describe each stage of this image.",
        "images": ["/home/ouyangjun/workspace/data/a/liuyao/SceneSayer-main/images/training_pipelines.jpg"]
    },
    {
        "role": "Assistant",
        "content": ""
    }
]

# load images and prepare for inputs
pil_images = load_pil_images(conversation)
prepare_inputs = vl_chat_processor(
    conversations=conversation,
    images=pil_images,
    force_batchify=True
).to(vl_gpt.device)
inputs_embeds = vl_gpt.prepare_inputs_embeds(**prepare_inputs)

Error during the last step.The traceback is as follows:

Traceback (most recent call last): File "/home/ouyangjun/workspace/data/a/liuyao/DeepSeek-VL-main/inference.py", line 71, in inputs_embeds = vl_gpt.prepare_inputs_embeds(**prepare_inputs) File "/home/ouyangjun/workspace/data/a/liuyao/DeepSeek-VL-main/deepseek_vl/models/modeling_vlm.py", line 155, in prepare_inputs_embeds vision_output = self.vision_model(images) File "/home/sturg/anaconda3/envs/mistral/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/ouyangjun/workspace/data/a/liuyao/DeepSeek-VL-main/deepseek_vl/models/clip_encoder.py", line 121, in forward image_forward_outs = self.vision_tower(images, **self.forward_kwargs) File "/home/sturg/anaconda3/envs/mistral/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/ouyangjun/workspace/data/a/liuyao/DeepSeek-VL-main/deepseek_vl/models/siglip_vit.py", line 586, in forward x = self.forward_features(x) File "/home/ouyangjun/workspace/data/a/liuyao/DeepSeek-VL-main/deepseek_vl/models/siglip_vit.py", line 563, in forward_features x = self.patch_embed(x) File "/home/sturg/anaconda3/envs/mistral/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/sturg/anaconda3/envs/mistral/lib/python3.10/site-packages/timm/layers/patch_embed.py", line 131, in forward x = self.proj(x) images shape: torch.Size([1, 3, 384, 384]) File "/home/sturg/anaconda3/envs/mistral/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/sturg/anaconda3/envs/mistral/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "/home/sturg/anaconda3/envs/mistral/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: Input type (c10::BFloat16) and bias type (c10::Half) should be the same

i'm trying

# if hasattr(prepare_inputs, 'input_ids'):
#     prepare_inputs.input_ids = prepare_inputs.input_ids.to(torch.long)
# if hasattr(prepare_inputs, 'images_seq_mask'):
#     prepare_inputs.images_seq_mask = prepare_inputs.images_seq_mask.to(torch.long)
# if hasattr(prepare_inputs, 'images_emb_mask'):
#     prepare_inputs.images_emb_mask = prepare_inputs.images_emb_mask.to(torch.long)

But the latter causes many indexing errors.. How to fix it?

Upvotes: 0

Views: 28

Answers (0)

Related Questions