Reputation: 1
I want to use OpenAI's CLIP model to perform Multimodal Named Entity Recognition on an image-text dataset.
I have converted these image-text into embeddings, but how do I perform NER on them now? Or is there a better approach using the CLIP model?
Upvotes: 0
Views: 198