Jônatas Zulske
Jônatas Zulske

Reputation: 87

Import big file into mysql, on a Heroku app

I need some help.

I have an php app on Heroku. In this app, there's a form that upload an csv file, to be imported on Mysql(cleardb).

The problem it's, that the file it's large (will always be large), and the function it's taking too much time to finish (about 90 seconds). The timeout on heroku it's 30 seconds, and there's no way to change that.

I tried to use Heroku Scheduler (like cron), but the minimal frequency it's 10 minutes, and a script that can take 90 seconds, using this scheduler, will take 30 minutes, because as i said, the timeout of heroku it's 30 seconds.

Well, what can i do? there's an alternative scheduler?

Example of the import:

CSV

name,productName,points,categoryName,coordName,date

MYSQL

[users]

userID
userName
categoryID
coordID

[products]

productID
productName

[coords]

coordID
coordName

[categories]

categoryID
categoryName

[points]

pointID
productID
userID
value

in all tables, i need to make a select to see if the category, coord, etc, already exists. If exists, return id, if not, insert a new line.

I dont think that there's a way to decrease time execution time. I'm trying to find a way to decrease the schedule to 2 minutes, 3 minutes, etc. So, in about 10 minutes, all lines will be imported.

thanks!

Upvotes: 0

Views: 452

Answers (1)

VolkerK
VolkerK

Reputation: 96159

This is what I would start with (because it's relatively simple/quick to implement and should give you a reference point and some wiggle room for further tests in a short period of time):

Import all the data as-is into a temporary table (if the server's RAM allow you can also try the memory engine).
Then, after the data has been imported, create the indices needed for the following queries (and check via EXPLAIN or any other tool that shows you if and how the indices are used):

  • query all the categories that are in the temporary table but not in your live data tables
    • create those categories in the live tables.
  • query all coords that are in the temporary table but not in your live data tables.
    • create those coords in the live tables.
  • you get the idea ...repeat for all necessary data.
  • then just import the data from the temp table into the live tables via INSERT...SELECT queries. Think about what kind of transaction/locking you will need for this. It might be that the order of queries will make a difference. But if you're only adding data, I assume that a rather low isolation level should do... not sure though. But maybe that's not your concern right now?

Upvotes: 1

Related Questions