Gaurav
Gaurav

Reputation: 1587

How to write a table in PostgreSQL from R?

At present to insert data in a PostgreSQL table I have to create an empty table and then do an insert into table values ... along with a dataframe collapsed insto a single string with all the values. It doesn't work for large sized dataframes.

The dbWtriteTable() doesn't work for PostgreSQL and gives the following error...

Error in postgresqlpqExec(new.con, sql4) : RS-DBI driver: (could not Retrieve the result : ERROR: syntax error at or near "STDIN" LINE 1: COPY "table_1" FROM STDIN

I have tried the following hack as suggested in answer to a similar question asked before. Here's the link... How do I write data from R to PostgreSQL tables with an autoincrementing primary key?

body_lines <- deparse(body(RPostgreSQL::postgresqlWriteTable))
new_body_lines <- sub(
  'postgresqlTableRef(name), "FROM STDIN")', 
  'postgresqlTableRef(name), "(", paste(shQuote(names(value)), collapse = ","), ") FROM STDIN")', 
  body_lines,
  fixed = TRUE
)
fn <- RPostgreSQL::postgresqlWriteTable
body(fn) <- parse(text = new_body_lines)
while("RPostgreSQL" %in% search()) detach("package:RPostgreSQL")
assignInNamespace("postgresqlWriteTable", fn, "RPostgreSQL")

This hack still doesn't work for me. The postgresqlWriteTable() throws exactly the same error... What exactly is the problem here?

As an alternative I have tried using dbWriteTable2() from caroline package. And it throws a different error...

Error in postgresqlExecStatement(conn, statement, ...) : 
  RS-DBI driver: (could not Retrieve the result : ERROR:  column "id" does not exist in table_1
)
creating NAs/NULLs for for fields of table that are missing in your df
Error in postgresqlExecStatement(conn, statement, ...) : 
  RS-DBI driver: (could not Retrieve the result : ERROR:  column "id" does not exist in table_1
)

Is there any other method to write a large dataframe into a table in PostgreSQL directly?

Upvotes: 6

Views: 9984

Answers (3)

Rahul Jakhmola
Rahul Jakhmola

Reputation: 41

So the answer showing batch processing given earlier is 99.99% correct. However, it doesn't work on windows because of a tiny argument required at the 'insertBatch' function. (was not able to add a comment for the same answer)

The 'shQuote' function requires an argument type = 'cmd2' for it to work.

However, to add an argument there, you need this answer:

[https://stackoverflow.com/questions/6827299/r-apply-function-with-multiple-parameters][1]

So, the new 'insertBatch' function becomes:

insertBatch <- function(con,tn,df,size=100L) {
  if (nrow(df)==0L) return(invisible());
  cnt <- (nrow(df)-1L)%/%size+1L;
  for (i in seq(0L,len=cnt)) {
    sql <- paste0("insert into \"",tn,"\" values (",do.call(paste,c(sep=',',collapse='),(',lapply(df[seq(i*size+1L,min(nrow(df),(i+1L)*size)),],shQuote,type = 'cmd2'))),");");
    dbSendQuery(con,sql);
  };
  invisible();
};

Upvotes: 0

heiko
heiko

Reputation: 267

I had the same error while working through this example.

For me worked:

dbWriteTable(con, "cartable", value = df, overwrite = T, append = F, row.names = FALSE)

While I have configured a table "cartable" in pgAdmin. So an empty table existed and I had to overwrite that table with values.

Upvotes: 3

bgoldst
bgoldst

Reputation: 35324

Ok, I'm not sure why dbWriteTable() would be failing; there may be some kind of version/protocol mismatch. Perhaps you could try installing the latest versions of R, the RPostgreSQL package, and upgrading the PostgreSQL server on your system, if possible.

Regarding the insert into workaround failing for large data, what is often done in the IT world when large amounts of data must be moved and a one-shot transfer is infeasible/impractical/flaky is what is sometimes referred to as batching or batch processing. Basically, you divide the data into smaller chunks and send each chunk one at a time.

As a random example, a few years ago I wrote some Java code to query for employee information from an HR LDAP server which was constrained to only provide 1000 records at a time. So basically I had to write a loop to keep sending the same request (with the query state tracked using some kind of weird cookie-based mechanism) and accumulating the records into a local database until the server reported the query complete.

Here's some code that manually constructs the SQL to create an empty table based on a given data.frame, and then insert the content of the data.frame into the table using a parameterized batch size. It's mostly built around calls to paste() to build the SQL strings, and dbSendQuery() to send the actual queries. I also use postgresqlDataType() for the table creation.

## connect to the DB
library('RPostgreSQL'); ## loads DBI automatically
drv <- dbDriver('PostgreSQL');
con <- dbConnect(drv,host=...,port=...,dbname=...,user=...,password=...);

## define helper functions
createEmptyTable <- function(con,tn,df) {
    sql <- paste0("create table \"",tn,"\" (",paste0(collapse=',','"',names(df),'" ',sapply(df[0,],postgresqlDataType)),");");
    dbSendQuery(con,sql);
    invisible();
};

insertBatch <- function(con,tn,df,size=100L) {
    if (nrow(df)==0L) return(invisible());
    cnt <- (nrow(df)-1L)%/%size+1L;
    for (i in seq(0L,len=cnt)) {
        sql <- paste0("insert into \"",tn,"\" values (",do.call(paste,c(sep=',',collapse='),(',lapply(df[seq(i*size+1L,min(nrow(df),(i+1L)*size)),],shQuote))),");");
        dbSendQuery(con,sql);
    };
    invisible();
};

## generate test data
NC <- 1e2L; NR <- 1e3L; df <- as.data.frame(replicate(NC,runif(NR)));

## run it
tn <- 't1';
dbRemoveTable(con,tn);
createEmptyTable(con,tn,df);
insertBatch(con,tn,df);
res <- dbReadTable(con,tn);
all.equal(df,res);
## [1] TRUE

Note that I didn't bother prepending a row.names column to the database table, unlike dbWriteTable(), which always seems to include such a column (and doesn't seem to provide any means of preventing it).

Upvotes: 7

Related Questions