LangChain E-Mails with LLM

Question

I am quite new to LangChain and Python as im mainly doing C# but i am interested in using AI on my own data. So i wrote some python code using langchain that:

Gets my Emails via IMAP
Creates JSON from my E-Mails (JSONLoader)
Creates a Vectordatabase where each mail is a vector (FAISS, OpenAIEmbeddings)
Does a similarity search according to the query returning the 3 mails that match the query the most
feeds the result of the similarity search to the LLM (GPT 3.5 Turbo) using the query AGAIN

The LLM Prompt then looks something like:

The question is

{query}

Here are some information that can help you to answer the question:

{similarity_search_result}

Ok so far so good... when my question is:

When was my last mail sent to xyz@gmail.com?

i get a correct answer... -> e.g last mail received 10.04.2024 14:11

But what if i want to have an answer to the following question

How many mails have been sent by xyz@gmail.com?

Because the similarity search only gets the vectors that are most similar, how can i just get an answer about the amount? Even if the similarity search would deliver 150 mails instead of 3 sent by xyz@gmail.com i cant just feed them all into the LLM prompt right?

So what is my mistake here?

Evan · Accepted Answer

It sounds like you need what OpenAI calls "function calling" / tools. RAG is great for grabbing relevant documents to dump into the context window, but as you've seen it's not suitable for everything. Thankfully, we can add arbitrary capabilities without implementing our own hacky solution using function calling. You first implement a function that does what you want in python. When you query OpenAI, you provide a description of these functions (tools). The chat completions API can reason about your request, then respond with JSON containing arguments for you to pass to the function you defined. This allows llms to hypothetically take any actions a human would.

So, for your case of getting the number of emails by email address, you'd want to implement a function in python that perhaps queries for the number of emails for a given user via IMAP. I'll leave that task to you, but once you complete that, the below should serve as a working minimal example to build off of.

import json
from openai import OpenAI

client = OpenAI(api_key='YOUR API KEY')

tools = [
    {
        "type": "function",
        "function": {
            "name": "total_number_of_emails",
            "description": "Get the number of emails in an email user's inbox",
            "parameters": {
                "type": "object",
                "properties": {
                    "email_address": {
                        "type": "string",
                        "description": "The user's email address",
                    },
                },
                "required": ["email_address"],
            },
        },
    },
]

def total_number_of_emails(email_address):
    return 42 # replace with real code to grab # of emails

def test(query):
    cpl = client.chat.completions.create(
        model='gpt-3.5-turbo',
        messages=[{'role': 'user', 'content': query}],
        tools=tools,
        tool_choice='auto' # lets model decide whether to use a tool
    )
    for tool_call in cpl.choices[0].message.tool_calls:
        fn = tool_call.function
        if fn.name == 'total_number_of_emails':
            args = json.loads(fn.arguments)
            print(total_number_of_emails(args['email_address']))

test('How many mails have been sent by xyz@gmail.com?')

If you simply copy and paste the above code unmodified, add your api key, and execute it, it should print "42" every time.

LangChain E-Mails with LLM

Answers (1)

Related Questions