Reputation: 28274
I would like to bulk-INSERT/UPSERT a moderately large amount of rows to a postgreSQL database using R. In order to do so I am preparing a multi-row INSERT string using R.
query <- sprintf("BEGIN;
CREATE TEMPORARY TABLE
md_updates(ts_key varchar, meta_data hstore) ON COMMIT DROP;
INSERT INTO md_updates(ts_key, meta_data) VALUES %s;
LOCK TABLE %s.meta_data_unlocalized IN EXCLUSIVE MODE;
UPDATE %s.meta_data_unlocalized
SET meta_data = md_updates.meta_data
FROM md_updates
WHERE md_updates.ts_key = %s.meta_data_unlocalized.ts_key;
COMMIT;", md_values, schema, schema, schema, schema)
DBI::dbGetQuery(con,query)
The entire function can be found here. Surprisingly (at leat to me) I learned that the UPDATE part is not the problem. I left it out and ran a the query again and it wasn't much faster. INSERT a million+ records seems to be the issue here.
I did some research and found quite some information:
what causes large inserts to slow down
answers from @Erwin Brandstetter and @Craig Ringer were particularly helpful. I was able to speed things up quite a bit by dropping indices and following a few other suggestions.
However, I struggled to implement another suggestion which sounded promising: COPY
. The problem is I can't get it done from within R.
The following works for me:
sql <- sprintf('CREATE TABLE
md_updates(ts_key varchar, meta_data hstore);
COPY md_updates FROM STDIN;')
dbGetQuery(sandbox,"COPY md_updates FROM 'test.csv' DELIMITER ';' CSV;")
But I can't get it done without reading from a extra .csv file. So my questions are:
Is COPY
really a promising approach here (over the multi-row INSERT I got?
Is there a way to use COPY
from within R without writing data to a file. Data does fit in memory and since it's already in mem why write to disk?
I am using PostgreSQL 9.5 on OS X and 9.5 on RHEL respectively.
Upvotes: 6
Views: 7111
Reputation: 48277
RPostgreSQL has a "CopyInDataframe" function that looks like it should do what you want:
install.packages("RPostgreSQL")
library(RPostgreSQL)
con <- dbConnect(PostgreSQL(), user="...", password="...", dbname="...", host="...")
dbSendQuery(con, "copy foo from stdin")
postgresqlCopyInDataframe(con, df)
Where table foo
has the same columns as dataframe df
Upvotes: 9