Aleshan
Aleshan

Reputation: 41

Multimodal LLM Memory

I am trying to add memory to my application. It is a multimodal Rag based system.

def run_final_query(llm, query, base64_image=None, compressed_image_str=None):
    # Define the system prompt to set model behavior
    system_prompt = {
        "role": "system",
        "content": "You are a helpful assistant"
    }

# Prepare the primary query content
user_content = [{"type": "text", "text": query or "Here is the image context you provided:"}]

# Conditionally add image content to user input
if base64_image:
    user_content.append({
        "type": "image_url",
        "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}
    })
if compressed_image_str:
    user_content.append({
        "type": "image_url",
        "image_url": {"url": f"data:image/jpeg;base64,{compressed_image_str}"}
    })

# Create message structure
messages = [
    system_prompt,
    {"role": "user", "content": user_content}
]

# Stream the response from the LLM and print each chunk
for chunk in llm.stream(messages):
    print(chunk.content, end="", flush=True)

print()

I would like to add memory so that the model has context for all future answers. My rag returns an image compressed_image_str. This is the context to information in the documents needed to answer questions.

base64_image is an image that the user uploads and can ask questions on it.

Here is the calling function

def answer_query(query="", image_data=None, k=4):
    # Initialize ChatOpenAI model for image summarization if an image is provided
    llm = ChatOpenAI(model="gpt-4o", temperature=0)
    
search_query = ""
base64_image=""
if image_data!= None:
    # Generate the image summary
    base64_image = compress_image_from_pil(image_data)
    image_summary = generate_image_summary(llm, base64_image)
    search_query = image_summary

# Append text query to the search query
search_query = search_query + " " + query

# Perform the Byaldi search based on the generated summary or text query
search_results = model.search(search_query, k=k)

# Decode images from search results and combine them vertically
images = [decode_base64_image(result['base64']) for result in search_results]
combined_image = combine_images_grid(images) if images else None

# Compress the combined image for ChatGPT model usage
compressed_image_str = compress_image_from_pil(combined_image) if combined_image else None

# Prepare the final query and get the output

run_final_query(
    llm=llm,
    query=query,
    base64_image=base64_image,
    compressed_image_str=compressed_image_str
)

I am struggling to add memory so that the model has passed question context as well as the passed image the user uploaded to answer questions on.

Upvotes: 0

Views: 60

Answers (0)

Related Questions