Building production level RAG for csv files

I am tasked to build a production level RAG application over CSV files. Possible Approches:

First approach is giving vague answer for using unstructured approach to structured data and second is doing very good but I suspect its scalability. I need suggesstion.

Upvotes: 0

Views: 1608

Answers (1)

Safder Raza
Safder Raza

Reputation: 11

Depends on how many and large your files are in production. Can you describe your use case a bit more?

  • How many csv files do you think you have?
  • are the files related to each other like tables in a database.

One issue you are going to run into with even approach #2 is how do you know which csv file to load in dataframe?

One approach, though not used in production environment, that worked well for me was

  1. during indexing, instead of loading the whole csv into vectordb, use LLM to summarize the csv file. Index the summary and make sure file path is included in the document metadata
  2. At retrieval time, you should get good hits due the table summary. Find the source csv from the document's metadata and load that into the dataframe.

Obviously this approach might get expensive if you have tons csv files.

There are a few other methods like Chain of tables and Mix-Self-Consistency approach that I have read but not implemented

Upvotes: 1

Related Questions