Reputation: 402
I have an app that will take sales made available to vendors at Whole Foods and process the daily sales data by store and item. All the parent information is stored in one downloaded CSV with about 10,000 lines per month.
The importing process checks for new stores before importing the sale information.
I don't know how to track 'time' of processes in ruby and rails but i was wondering if it would be 'faster' to process one line at a time to each table or to process the file for one table (stores) and then to the other table (sales)
If it matters in anything, new stores are not often added though stores might be closed (and the import checks for that as well), so the scan through the stores might only add a few new entries whereas every row of the csv is added to the sales.
If this isn't appropriate - I apologize - still working out the kinks of the rules
Upvotes: 1
Views: 297
Reputation: 52357
When it comes to processing data with Ruby the memory consumption is what you should be concerned about.
With csv processing in Ruby, the best you can do is reading line by line:
file = CSV.open("data.csv")
while line = file.readline
# do stuff
end
This way no matter how many lines are in the file, there always be only single one (+ previous processed one) loaded into memory at a time - GC will collect processed lines as your program executes. This way is almost no-memory consumptive + it will speed up the parsing process, too.
i was wondering if it would be 'faster' to process one line at a time to each table or to process the file for one table (stores) and then to the other table (sales)
I would go with one line at a time to each table.
Upvotes: 1