Minh Ha Pham
Minh Ha Pham

Reputation: 2596

Load large data to R data.table from Postgresql

I store my data in Postgresql server. I want to load a table which has 15mil rows to data.frame or data.table

I use RPostgreSQL to load data.

library(RPostgreSQL)
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, ...)

# Select data from a table
system.time(
df <- dbGetQuery(con, "SELECT * FROM 15mil_rows_table")
)

It took 20 minutes to load data from DB to df. I use google cloud server which have 60GB ram and 16 Core CPU

What should I do to reduce load time?

Upvotes: 7

Views: 2942

Answers (2)

Minh Ha Pham
Minh Ha Pham

Reputation: 2596

I use the method as @Jan Gorecki with zip data to save memory.

1- Dump table to csv

psql -h localhost -U user -d 'database' -c "COPY 15mil_rows_table TO stdout DELIMITER ',' CSV HEADER" | gzip > 15mil_rows_table.csv.gz &

2- Load data in R

DT <- fread('zcat 15mil_rows_table.csv.gz')

Upvotes: 3

jangorecki
jangorecki

Reputation: 16697

Not sure if this will reduce load time, for sure it may reduce load time as both processes are quite performance efficient. You can leave a comment about the timming.

  1. using bash run psql as dump table to csv:

COPY 15mil_rows_table TO '/path/15mil_rows_table.csv' DELIMITER ',' CSV HEADER;
  1. in R just fread it:

library(data.table)
DT <- fread("/path/15mil_rows_table.csv")

Upvotes: 4

Related Questions