6:[["$","$Le",null,{}],["$","div",null,{"className":"min-h-screen bg-gray-100 p-6","children":[["$","$Lf",null,{}],["$","script",null,{"type":"application/ld+json","dangerouslySetInnerHTML":{"__html":"{\"@context\":\"https://schema.org\",\"@type\":\"QAPage\",\"mainEntity\":{\"@type\":\"Question\",\"name\":\"BLIP encoder-based\",\"text\":\"

I’m wanting to use BLIP for image captioning. How can I ensure that captions are generated by an encoder and not decoder? I’ve been using the huggingface model:https://huggingface.co/docs/transformers/model_doc/blip

\\n

Thanks!

\\n

I tried setting is_decoder=False in BlipConfigText to configure the model however, I can’t get my model to train.

\\n\",\"author\":{\"@type\":\"Person\",\"name\":\"Nina Grundlingh\"},\"upvoteCount\":0,\"answerCount\":0,\"acceptedAnswer\":null}}"}}],["$","div",null,{"className":"bg-white shadow-md rounded-lg p-6 mb-6 relative","children":[["$","div",null,{"className":"absolute top-4 right-4 flex flex-wrap space-x-2","children":[["$","span","huggingface-transformers",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/huggingface-transformers/1","children":"huggingface-transformers"}]}],["$","span","caption",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/caption/1","children":"caption"}]}],["$","span","huggingface",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/huggingface/1","children":"huggingface"}]}]]}],["$","div",null,{"className":"flex items-center mb-4","children":[["$","img",null,{"src":"https://lh3.googleusercontent.com/a/AGNmyxZzFn6sFq96E9ZVVjLMK3S1MRlcEXZtw2Z_UV0e=k-s256","alt":"Nina Grundlingh","className":"w-16 h-16 rounded-full border"}],["$","div",null,{"className":"ml-4","children":[["$","a",null,{"href":"https://stackoverflow.com/users/21675820/nina-grundlingh","target":"_blank","rel":"noopener noreferrer","className":"text-lg font-semibold text-blue-600 hover:underline","children":"Nina Grundlingh"}],["$","p",null,{"className":"text-sm text-gray-500","children":["Reputation: ",1]}]]}]]}],["$","h1",null,{"className":"text-2xl font-bold text-gray-800 mb-4","children":"BLIP encoder-based"}],["$","p",null,{"className":"text-gray-700 mt-4","dangerouslySetInnerHTML":{"__html":"

Thanks!

I tried setting is_decoder=False in BlipConfigText to configure the model however, I can’t get my model to train.

\n"}}],["$","div",null,{"className":"text-gray-600 text-sm mt-4","children":[["$","p",null,{"children":["Upvotes: ",0]}],["$","p",null,{"children":["Views: ",311]}]]}]]}],["$","div",null,{"className":"container mx-auto","children":[["$","h2",null,{"className":"text-2xl font-semibold text-gray-800 mb-6","children":["Answers (",0,")"]}],[]]}],["$","div",null,{"className":"bg-white shadow-md rounded-lg p-6 mt-6","children":[["$","h2",null,{"className":"text-2xl font-semibold text-gray-800 mb-4","children":"Related Questions"}],["$","ul",null,{"className":"list-disc list-inside","children":[["$","li","78192634",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/78192634","className":"text-blue-600 hover:underline","children":"Issue with HuggingFace Inference API for ViT Model - “image-feature-extraction” Error"}]}],["$","li","77499162",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/77499162","className":"text-blue-600 hover:underline","children":"How does one reinitialize the weights of a Hugging Face LLaMA v2 model the official way as the original model?"}]}],["$","li","76857722",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/76857722","className":"text-blue-600 hover:underline","children":"Huggingface SFT for completion only not working"}]}],["$","li","72776834",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/72776834","className":"text-blue-600 hover:underline","children":"Blenderbot FineTuning"}]}],["$","li","71861922",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/71861922","className":"text-blue-600 hover:underline","children":"Question Answering with pre-trained model T5"}]}],["$","li","70454405",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/70454405","className":"text-blue-600 hover:underline","children":"Running out of Memory Training Google Big Bird with Huggingface"}]}]]}]]}]]}],["$","$L11",null,{}],["$","$L12",null,{}],["$","$L13",null,{}],["$","$L14",null,{}],["$","$L15",null,{}]]

BLIP encoder-based

Answers (0)

Related Questions