I am tasked to build a production level RAG application over CSV files. Possible Approches: Embedding --> VectorDB --> Taking user query --> Similarity or Hybrid Search --> LLM --> Result Csv to pandas df --> Ask LLM for py code to query from user prompt --> Query in df --> Give to LLM for analysis --> Result First approach is giving vague answer for using unstructured approach to structured data and second is doing very good but I suspect its scalability. I need suggesstion.

Reputation: 21

Building production level RAG for csv files

I am tasked to build a production level RAG application over CSV files. Possible Approches:

Embedding --> VectorDB --> Taking user query --> Similarity or Hybrid Search --> LLM --> Result
Csv to pandas df --> Ask LLM for py code to query from user prompt --> Query in df --> Give to LLM for analysis --> Result

First approach is giving vague answer for using unstructured approach to structured data and second is doing very good but I suspect its scalability. I need suggesstion.

Upvotes: 0

Answers (1)

Safder Raza

Reputation: 11

Depends on how many and large your files are in production. Can you describe your use case a bit more?

How many csv files do you think you have?
are the files related to each other like tables in a database.

One issue you are going to run into with even approach #2 is how do you know which csv file to load in dataframe?

One approach, though not used in production environment, that worked well for me was

during indexing, instead of loading the whole csv into vectordb, use LLM to summarize the csv file. Index the summary and make sure file path is included in the document metadata
At retrieval time, you should get good hits due the table summary. Find the source csv from the document's metadata and load that into the dataframe.

Obviously this approach might get expensive if you have tons csv files.

There are a few other methods like Chain of tables and Mix-Self-Consistency approach that I have read but not implemented

Upvotes: 1

Building production level RAG for csv files

Answers (1)

Related Questions