georges
georges

Reputation: 282

Best solution to import and update millions of records every day on Heroku

I have to execute some rake tasks which import and update millions of products each day with data coming from CSV files. I currently use Heroku Scheduler to run rake tasks but I found that is doesn't work properly. I have one task which import data, and then another task that update data.

My two issues are:

What resource should I be using to do all of this big data stuff on Heroku? Maybe another solution such as Amazon AWS or Google Cloud would be more adapted?

Upvotes: 0

Views: 84

Answers (1)

spickermann
spickermann

Reputation: 106932

My first idea would be to import the CSV into a temp table in the database without any change or data transformation. That should be pretty fast with pure SQL.

Then I would create one background job for each line in the temp table (for example with Sidekiq). And each of those jobs should only import that specific line.

Running multiple background jobs in parallel should speed up the process significantly. Plus when one line raises an error and cannot be imported then other jobs are not affected and the background job queue then shows you exactly which job and therefore which line could not be imported and why.

Upvotes: 1

Related Questions