Daily import into Heroku hosted Postgres database

Question

I'm in the process of porting over a MySQL database to a Heroku hosted, dedicated PostgreSQL instance. I understand how to get the initial data over to Heroku. However, there is a daily "feed" of data from an external company that will need to be imported each day. It is pushed up to an FTP server and it's a zip file containing several different CSV files. Normally, I could/would just scp it over to the Postgres box and then have a cron job that does a "COPY tablename FROM path/to/file.csv" to import the data. However, using Heroku has me a bit baffled as to the best way to do this. Note: I've seen and reviewed the heroku dev article on importing data. But, this is more of a dump file. I'm just dealing with a daily import from a CSV file.

Does anyone do something similar to this on Heroku? If so, can you give any advice on what's the best way.

Just a bit more info: My application is Python/Django 1.3.3 on the Cedar stack. And my files can be a bit large. Some of them can be over 50K records. So, to loop through them and use the Django ORM is probably going to be a bit slow (but, still might be the best/only solution).

hgmnz · Accepted Answer

Two options:

Boot up a non-heroku EC2 instance, fetch from FTP, unzip and initiate the copy from there. By making use of the COPY STDIN option (http://www.postgresql.org/docs/9.1/static/sql-copy.html) you can instruct it that the data is coming from the client connection, as opposed to from a file on the server's filesystem which you don't have access to.
How large is the file? It might fit in a dyno's ephemeral filesystem, so a process or one off job can download the file from the FTP server and do the whole process from within a dyno. Once the process exits, away goes the filesystem data.

Daily import into Heroku hosted Postgres database

Answers (1)

Related Questions