Getting Tokens Usage Metadata from Gemini LLM calls in LangChain RAG RunnableSequence

Question

I would like to have the token utilisation of my RAG chain each time it is invoked.

No matter what I do, I can't seem to find the right way to output the total tokens from the Gemini model I'm using.

import vertexai
from langchain_google_vertexai import VertexAI
from vertexai.generative_models import GenerativeModel

vertexai.init(
    project='MY_PROJECT',
    location="MY_LOCATION",
)

question = "What is the meaning of life"


llm = VertexAI(model_name="gemini-1.5-pro-001",)
response1 = llm.invoke(question)


llm2 = GenerativeModel("gemini-1.5-pro-001",)
response2 = llm2.generate_content(question)

response1 above is just a string.

response2 is what I want i.e. a dictionary containing usage_metadata, safety_rating, finish_reason, etc. But I haven't managed to make my RAG chain run using this approach.

My RAG chain is a RunnableSequence (from langchain_core.runnables) and also I've tried using callbacks as the chain does not support class 'vertexai.generative_models.GenerativeModel'

from langchain_google_vertexai import VertexAI
from langchain.callbacks.base import BaseCallbackHandler
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.outputs import LLMResult
from langchain_core.messages import BaseMessage


class LoggingHandler(BaseCallbackHandler):

    def on_llm_start(self, serialized, prompts, **kwargs) -> None:

        print('On LLM Start: {}'.format(prompts))

    def on_llm_end(self, response: LLMResult, **kwargs) -> None:

        print('On LLM End: {}'.format(response))


callbacks = [LoggingHandler()]
llm = VertexAI(model_name="gemini-1.5-pro-001",)
prompt = ChatPromptTemplate.from_template("What is 1 + {number}?")

chain = prompt | llm

chain_with_callbacks = chain.with_config(callbacks=callbacks)
response = chain_with_callbacks.invoke({"number": "2"})

This shows the content below

On LLM Start: ['Human: What is 1 + 2?']
On LLM End: generations=[[GenerationChunk(text='Human: What is 1 + 2?
Assistant: 3 
', generation_info={'is_blocked': False, 'safety_ratings': [{'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability_label': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability_label': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability_label': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability_label': 'NEGLIGIBLE', 'blocked': False}], 'citation_metadata': })]] llm_output=None run=None

I.e. no usage metadata.

Any idea how to have the usage metadata for each RAG chain call?

Getting Tokens Usage Metadata from Gemini LLM calls in LangChain RAG RunnableSequence

Answers (1)

Related Questions