Reputation: 325
I'm starting with OpenAI API and experimenting with langchain. I have a .csv file with approximately 1000 rows and 85 columns with string values. I found some beginner article that I followed and have a colab notebook with the following code:
txt_file_path = '/content/drive/My Drive/Colab Notebooks/preprocessed_data_10.csv'
with open(txt_file_path, 'r', encoding="utf-8") as file:
data = file.read()
txt_file_path = '/content/drive/My Drive/Colab Notebooks/preprocessed_data_10.csv'
loader = TextLoader(file_path=txt_file_path, encoding="utf-8")
data = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
data = text_splitter.split_documents(data)
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(data, embedding=embeddings)
llm = ChatOpenAI(temperature=0.7, model_name="gpt-4")
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)
conversation_chain = ConversationalRetrievalChain.from_llm(
llm=llm,
chain_type="stuff",
retriever=vectorstore.as_retriever(),
memory=memory
)
query = "question"
result = conversation_chain({"question": query})
answer = result["answer"]
answer
The errors I got were:
Error code: 429 - {'error': {'message': 'Request too large for gpt-4 in organization org-xxx on tokens per min (TPM): Limit 10000, Requested 139816.
and
BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens. However, your messages resulted in 32045 tokens.
I tried to figure out how big csv file I can feed it with and reduced the file for 10 rows and 53 columns.
What are possible workarounds so I can search on entire csv file?
Any help would be much appreciated.
Thanks.
Upvotes: 2
Views: 1864
Reputation: 17215
The OpenAI Assistants API can process CSV files effectively when the Code Interpreter tool is enabled.
Unlike the File Search tool, which does not support CSV files natively, Code Interpreter allows the assistant to parse and analyze CSV data.
When creating or updating your assistant programmatically, include the code_interpreter tool in the configuration.
Below is an example JSON setup:
{
"name": "CSV Analyzer",
"instructions": "You are a data analyst who interprets CSV files to answer questions.",
"model": "gpt-4o",
"tools": ["code_interpreter"]
}
After enabling Code Interpreter, upload your prepared CSV file to the assistant via the API or platform. The assistant can then access and analyze it.
Note: When retrieving the thread via the API (e.g., client.beta.threads.messages.list), the Code Interpreter logic is hidden
References
https://code.recuweb.com/2025/processing-csv-files-with-openai-assistant-manager/
Upvotes: 0
Reputation: 1
It is not usual to pass large csv to LLMs. Alternatively you can generate proper code from LLM and then execute with exec() function. In this way it will execute the code on your PC and it will solve the token limit.
Upvotes: 0