roarkz
roarkz

Reputation: 831

Is there a faster way to upload data from R to MySql?

I am using the following code to upload a new table into a mysql database.

library(RMySql)
library(RODBC)

con <- dbConnect(MySQL(),
  user = 'user',
  password = 'pw',
  host = 'amazonaws.com',
  dbname = 'db_name')

dbSendQuery(con, "CREATE TABLE table_1 (
        var_1 VARCHAR(50),
        var_2 VARCHAR(50),
        var_3 DOUBLE,
        var_4 DOUBLE);
        ")

channel <- odbcConnect("db name")
sqlSave(channel, dat = df, tablename = "tb_name", rownames = FALSE, append = 
TRUE)

The full data set is 68 variables and 5 million rows. It is taking over 90 minutes to upload 50 thousand rows to MySql. Is there a more efficient way to upload the data to MySql. I originally tried dbWriteTable() but this would result in an error message saying the connection to the database was lost.

Upvotes: 3

Views: 4048

Answers (2)

Parfait
Parfait

Reputation: 107587

Consider a CSV export from R for an import into MySQL with LOAD DATA INFILE:

...
write.csv(df, "/path/to/filename.csv", row.names=FALSE)

dbSendQuery(con, "LOAD DATA LOCAL INFILE '/path/to/filename.csv'
                  INTO TABLE mytable
                  FIELDS TERMINATED by ','
                  ENCLOSED BY '"'
                  LINES TERMINATED BY '\\n'")

Upvotes: 4

knb
knb

Reputation: 9295

You could try to disable the mysql query log:

dbSendQuery(con, "SET GLOBAL general_log = 'off'")

I can't tell if your mysql user account has the appropriate permissions to do that, or if it conflicts with your business needs.

Off the top of my head: Otherwise you could try to send the data in say 1000-row batches, using a for- loop in your Rscript, and maybe option verbose = true in your call to sqlSave

If you send the data in a single batch, Mysql might try to run the INSERT as a single transaction ("all-or-nothing") and if it fails it goes into recovery or just fails after inserting some random number of rows.

Upvotes: 0

Related Questions