Matt Bannert
Matt Bannert

Reputation: 28274

Speed up INSERT of 1 million+ rows into Postgres via R using COPY?

I would like to bulk-INSERT/UPSERT a moderately large amount of rows to a postgreSQL database using R. In order to do so I am preparing a multi-row INSERT string using R.

 query <- sprintf("BEGIN;
                         CREATE TEMPORARY TABLE 
                         md_updates(ts_key varchar, meta_data hstore) ON COMMIT DROP;

                         INSERT INTO md_updates(ts_key, meta_data) VALUES %s;
                         LOCK TABLE %s.meta_data_unlocalized IN EXCLUSIVE MODE;

                         UPDATE %s.meta_data_unlocalized
                         SET meta_data = md_updates.meta_data
                         FROM md_updates
                         WHERE md_updates.ts_key = %s.meta_data_unlocalized.ts_key;
                         COMMIT;", md_values, schema, schema, schema, schema)

DBI::dbGetQuery(con,query)

The entire function can be found here. Surprisingly (at leat to me) I learned that the UPDATE part is not the problem. I left it out and ran a the query again and it wasn't much faster. INSERT a million+ records seems to be the issue here.

I did some research and found quite some information:

bulk inserts

bulk inserts II

what causes large inserts to slow down

answers from @Erwin Brandstetter and @Craig Ringer were particularly helpful. I was able to speed things up quite a bit by dropping indices and following a few other suggestions.

However, I struggled to implement another suggestion which sounded promising: COPY. The problem is I can't get it done from within R.

The following works for me:

sql <- sprintf('CREATE TABLE 
            md_updates(ts_key varchar, meta_data hstore);
            COPY md_updates FROM STDIN;')


 dbGetQuery(sandbox,"COPY md_updates FROM 'test.csv' DELIMITER ';' CSV;")

But I can't get it done without reading from a extra .csv file. So my questions are:

I am using PostgreSQL 9.5 on OS X and 9.5 on RHEL respectively.

Upvotes: 6

Views: 7111

Answers (1)

Neil McGuigan
Neil McGuigan

Reputation: 48277

RPostgreSQL has a "CopyInDataframe" function that looks like it should do what you want:

install.packages("RPostgreSQL")
library(RPostgreSQL)
con <- dbConnect(PostgreSQL(), user="...", password="...", dbname="...", host="...")
dbSendQuery(con, "copy foo from stdin")
postgresqlCopyInDataframe(con, df)

Where table foo has the same columns as dataframe df

Upvotes: 9

Related Questions