Reputation: 31
I am trying to use the the crewai_tools, the RAG tools in particular. Lets take the DirectorySearchTool() in this case.
How do I use a custom embedder provider? The only options that embed chain have are from the following list. 'openai', 'gpt4all', 'huggingface', 'vertexai', 'azure_openai', 'google', 'mistralai', 'nvidia' Embedding models for which I need API keys.
directory_search_tool = DirectorySearchTool(
config=dict(
llm=dict(
provider="ollama", # Options include ollama, google, anthropic, llama2, and more
config=dict(
model="llama2",
# Additional configurations here
),
),
embedder=dict(
provider="google", # or openai, ollama, ...
config=dict(
model="models/embedding-001",
),
)
)
I know that CrewAI uses embedchain module for RAG operations and the default provider is set to openai. And to use openai embedding model one needs to set OPENAI_API_KEY as environment variable.
But the situation is that the LLMs that I am using is from the from my Company's/Organization's langchain.openai import ChatOpenAI module and I don't need to set API keys and stuff for using the LLMs. Something like this:
llm = ChatOpenAI(proxy_model_name='gpt-4-32k')
which is of type: gen_ai_hub.proxy.langchain.openai.ChatOpenAI
Question: How do I use a custom embedder provider? Is there a provider called custom or anything that i can use in my case?
I'm not really sure what to try. Any ideas would be helpful
Upvotes: 1
Views: 1124
Reputation: 1
I've used hugging face as provider and compatable embedder based on the embedding dimension and max tokens.
Below is the embedder dict config, which worked for me.
embedder=dict(
provider="huggingface", # or google, openai, anthropic,
llama2..
config=dict(
model="izhx/udever-bloom-1b1",
),
),
output : Inserted batches to chromadb. Accuracy of the output was not so great though.
Upvotes: 0