Oduah Tobi
Oduah Tobi

Reputation: 45

Parsing large CSVs while searching Database

Currently have a tricky problem and need ideas for most efficient way to go about solving.

We periodically iterate through large CSV files (~50000 to 2m rows), and for each row, we need to check a database table for matching columns.

So for example, each CSV row could have details about an Event - artist, venue, date/time etc, and for each row, we check our database (PG) for any rows that match the artist, venue and date/time the most, and then perform operations if any match is found.

Currently, the entire process is highly CPU, memory and time intensive pulling row by row, so we perform the matching in batches, but still seeking ideas for an efficient way to perform the comparison both memory-wise, and time-wise

Thanks.

Upvotes: 0

Views: 49

Answers (1)

claasz
claasz

Reputation: 2134

  1. Load the complete CSV file into a temporary table in your database (using a DB tool, see e.g. How to import CSV file data into a PostgreSQL table?)
  2. Perform matching and operations in-database, i.e. in SQL
  3. If necessary, truncate the temporary table afterwards

This would move most of the load into the DB server, avoiding all the ActiveRecord overhead (network traffic, result parsing, model instantiation etc.)

Upvotes: 1

Related Questions