Rohan Prasad
Rohan Prasad

Reputation: 31

Crewai - Can I use a Custom embedder in the crewai_tools?

I am trying to use the the crewai_tools, the RAG tools in particular. Lets take the DirectorySearchTool() in this case.

How do I use a custom embedder provider? The only options that embed chain have are from the following list. 'openai', 'gpt4all', 'huggingface', 'vertexai', 'azure_openai', 'google', 'mistralai', 'nvidia' Embedding models for which I need API keys.

directory_search_tool = DirectorySearchTool(
    config=dict(
        llm=dict(
            provider="ollama", # Options include ollama, google, anthropic, llama2, and more
            config=dict(
                model="llama2",
                # Additional configurations here
            ),
        ),
        embedder=dict(
            provider="google", # or openai, ollama, ...
            config=dict(
                model="models/embedding-001",
        ),
    )
)

I know that CrewAI uses embedchain module for RAG operations and the default provider is set to openai. And to use openai embedding model one needs to set OPENAI_API_KEY as environment variable.

But the situation is that the LLMs that I am using is from the from my Company's/Organization's langchain.openai import ChatOpenAI module and I don't need to set API keys and stuff for using the LLMs. Something like this:

llm = ChatOpenAI(proxy_model_name='gpt-4-32k')

which is of type: gen_ai_hub.proxy.langchain.openai.ChatOpenAI

Question: How do I use a custom embedder provider? Is there a provider called custom or anything that i can use in my case?

I'm not really sure what to try. Any ideas would be helpful

Upvotes: 1

Views: 1124

Answers (1)

Karthik
Karthik

Reputation: 1

I've used hugging face as provider and compatable embedder based on the embedding dimension and max tokens.

Below is the embedder dict config, which worked for me.

embedder=dict(
                provider="huggingface", # or google, openai, anthropic, 
 llama2.. 
                config=dict(
                    model="izhx/udever-bloom-1b1", 
                ),
            ),

output : Inserted batches to chromadb. Accuracy of the output was not so great though.

Upvotes: 0

Related Questions