CSV Import to multiple tables - speed consideration

Question

I have an app that will take sales made available to vendors at Whole Foods and process the daily sales data by store and item. All the parent information is stored in one downloaded CSV with about 10,000 lines per month.

The importing process checks for new stores before importing the sale information.

I don't know how to track 'time' of processes in ruby and rails but i was wondering if it would be 'faster' to process one line at a time to each table or to process the file for one table (stores) and then to the other table (sales)

If it matters in anything, new stores are not often added though stores might be closed (and the import checks for that as well), so the scan through the stores might only add a few new entries whereas every row of the csv is added to the sales.

If this isn't appropriate - I apologize - still working out the kinks of the rules

Andrey Deineko · Accepted Answer

When it comes to processing data with Ruby the memory consumption is what you should be concerned about.

With csv processing in Ruby, the best you can do is reading line by line:

file = CSV.open("data.csv")
while line = file.readline
  # do stuff
end

This way no matter how many lines are in the file, there always be only single one (+ previous processed one) loaded into memory at a time - GC will collect processed lines as your program executes. This way is almost no-memory consumptive + it will speed up the parsing process, too.

i was wondering if it would be 'faster' to process one line at a time to each table or to process the file for one table (stores) and then to the other table (sales)

I would go with one line at a time to each table.

CSV Import to multiple tables - speed consideration

Answers (1)

Related Questions