Reputation: 23
I am working on a project that involves using Azure Cognitive Search/Azure AI search for document search capabilities. We are using Ada Embeddings to create semantic vectors for our documents. However, we have a requirement to support cross-language query search, specifically querying Japanese documents with English questions.
My questions are :
From my understanding, Ada Embeddings is a language model developed by OpenAI primarily used for generating high-quality text. While it can create semantic embeddings for different languages, it does not inherently support cross-language query search.
To enable cross-language query search in Azure Cognitive Search with Ada Embeddings, I believe we would need to incorporate language translation skills. I am considering using translation skill for this purpose.
I would greatly appreciate any insights, guidance, or best practices on how to implement cross-language query search in Azure Cognitive Search with Ada Embeddings. Additionally, if there are any alternative approaches or considerations that I should be aware of, please let me know.
Thank you in advance for your help!
Upvotes: 0
Views: 367
Reputation: 466
Irrespective of the language, embeddings should be created on the same vector space for similar terms, if using the same model, so for vectors on their own, it should work without translation.
However, there may be scenarios in your use case that won't be possible to find with vectors only (for example product references, very specific terms) and in which case it may be convenient to use hybrid search to take advantage of both worlds. If this is the case, you would have to use either a translator before you issue the query from the application or use text translation skill at ingestion time.
Upvotes: 1