PythonNewbie
PythonNewbie

Reputation: 1163

How does the token for openai works and how can I use less tokens?

Hello beautiful people!

I'm currently trying to write my own "AI" with the help of OpenAI. I have followed Langchain and managed to end up having this code:

import os
import re

import discord
import requests
from discord.ext import commands
from langchain.chains import ConversationalRetrievalChain
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from transformers import GPT2TokenizerFast

intents = discord.Intents.default()
intents.typing = False
intents.presences = False
intents.message_content = True

bot = commands.Bot(command_prefix="!", intents=intents)

# Set up OpenAI API key and models
os.environ["OPENAI_API_KEY"] = 'xxxxxx'


def get_documentation():
    zendesk_url = "https://test.zendesk.com/api/v2/help_center/articles.json"

    documentation = []

    while zendesk_url:
        # Make a GET request to the Zendesk API to fetch articles for the current page
        response = requests.get(
            zendesk_url,
            headers={
                "Authorization": f"Basic xxxx",
                "Content-Type": "application/json"
            })

        # Check if the request was successful
        if response.status_code == 200:
            response_json = response.json()
            # Loop through the articles on the current page
            for article in response_json["articles"]:
                # Extract the title and body of the article
                title = article['title']
                body = article['body']

                # Remove any HTML tags and formatting from the body
                body = re.sub('<[^<]+?>', '', body)

                # Remove all newline characters from the body
                body = body.replace('\n', ' ')

                # Replace non-breaking spaces with regular spaces
                body = body.replace('\xa0', ' ')

                # Append the title and body to the documentation list
                documentation.append((title, body))

            # Check if there are more pages of articles and update the zendesk_url variable if necessary
            next_page_url = response_json["next_page"]
            zendesk_url = next_page_url if next_page_url else None
        else:
            # If the request was not successful, raise an exception with the error message
            response.raise_for_status()

    return documentation


# Load the GPT2 tokenizer
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
print(tokenizer)


# Define a function to count tokens
def count_tokens(text: str) -> int:
    return len(tokenizer.encode(text))


# Create a text splitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=24,
    length_function=count_tokens,
)

# Fetch and clean the documentation
documentation = get_documentation() # The len of documentation is 93

# Extract only the article bodies
article_bodies = [article_body for title, article_body in documentation]

# Split the article bodies into chunks
chunks = text_splitter.create_documents(article_bodies)

# Get embedding model
embeddings = OpenAIEmbeddings()

# Create vector database
db = FAISS.from_documents(chunks, embeddings)

qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0.1), db.as_retriever())


@bot.event
async def on_ready():
    print(f'We have logged in as {bot.user}')


chat_history = []
@bot.command()
async def ask(ctx, *, question):
    print(f"{ctx.author.name} asked: {question}")
    result = qa(
        {
            "question": question,
            "chat_history": chat_history
        }
    )
    chat_history.append((question, result['answer']))
    await ctx.send(result['answer'])


bot.run('xxxxxx')

What I do is that I connect to my zendesk, scrape all the documentation by calling get_documentation() and then use it for chunks. When I then call !ask question here then I should get an answer back. However by checking my latest usage. It ends up using lots of tokens and I feel like it might be too much and could need some explanation or if there is anything I could even improve?

Usage

I know that when I start the script, it usually ends up by having around 46,179 prompt but I don't really understand why I pay without even started to ask a question. How can I improve it to use less tokens?

Expected:

To use less tokens/use tokens when I ask a prompt

Actual:

Uses 40k+ tokens everytime I start.

Upvotes: 0

Views: 1746

Answers (1)

Yilmaz
Yilmaz

Reputation: 49571

From here:

Tokenization is the process of splitting the input and output texts into smaller units that can be processed by the LLM AI models. Tokens can be words, characters, subwords, or symbols, depending on the type and the size of the model. Tokenization can help the model to handle different languages, vocabularies, and formats, and to reduce the computational and memory costs. Tokenization can also affect the quality and the diversity of the generated texts, by influencing the meaning and the context of the tokens. Tokenization can be done using different methods, such as rule-based, statistical, or neural, depending on the complexity and the variability of the texts.

usage of tokens basically depends on input and output length, and model configuration. even a single punctuation can be classified as a token by the model. you can test the token usage at enter link description here

enter image description here

In the above example, "," and "." counted as a token. in order to reduce token usage

  • Keep prompts consice and precise. avoid using repetition, unnecessary punctuation and whitespaces, and special characters.

  • Limit output length. In langchain you pass max_tokens named parameter. The longer outputs require more tokens to generate. When you set a limit on the output length using the max_tokens parameter, the model will stop generating text once it reaches that token limit.

  • As LLM updated to new version, that means it learnt more so the more LLM knows less token it uses. For example, gpt-3.5-turbo is a more token-efficient version of GPT-3.

Upvotes: 1

Related Questions