Alexander Kachkaev
Alexander Kachkaev

Reputation: 912

Quickest way to load data from PostgreSQL to R

I’m planning to do some data analysis with R; the datasets are stored in PostgreSQL tables and some of them contain up to 2 million records. I thought this would not be a big problem for R and loading of the records would be rather quick, but the things turned out differently.

Doing something like this may take a minute or more, which is not what I was expecting:

library(RPostgreSQL);

drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname = "mydb", user="me", password="my_password");

records <- dbGetQuery(con, statement = paste(
  "SELECT *",
  "FROM my_table",
  "ORDER BY id")); # 1.5M records

Alternative code is even slower:

records2 <- dbSendQuery(con, "select * from my_table ORDER BY id")
fetch(records2,n=-1)

I can’t say my hardware is the most advanced in the world, but it’s a rather decent MacBook Pro with 8G RAM and SSD. When I fetch the same data with, let’s say, QGIS, things are done significantly faster.

What can one do to increase performance in such case? Alternative libraries? Tricks and hacks? Anything else?

Upvotes: 4

Views: 720

Answers (2)

jangorecki
jangorecki

Reputation: 16697

You should exclude ORDER BY - this is not part of loading data. It may significantly slows your query.
You can order data afterwards having them in R memory. If you are looking for fast ordering check this SO answer.

Upvotes: 1

Kirk Roybal
Kirk Roybal

Reputation: 17857

More of a redesign than answer to the question, but...

You could always plug R directly into PostgreSQL, and run your query without moving the data anywhere. Can't move it any faster than not moving it at all :)

PL/R for PostgreSQL

Upvotes: 0

Related Questions