Reputation: 831
I am using the following code to upload a new table into a mysql database.
library(RMySql)
library(RODBC)
con <- dbConnect(MySQL(),
user = 'user',
password = 'pw',
host = 'amazonaws.com',
dbname = 'db_name')
dbSendQuery(con, "CREATE TABLE table_1 (
var_1 VARCHAR(50),
var_2 VARCHAR(50),
var_3 DOUBLE,
var_4 DOUBLE);
")
channel <- odbcConnect("db name")
sqlSave(channel, dat = df, tablename = "tb_name", rownames = FALSE, append =
TRUE)
The full data set is 68 variables and 5 million rows. It is taking over 90 minutes to upload 50 thousand rows to MySql. Is there a more efficient way to upload the data to MySql. I originally tried dbWriteTable()
but this would result in an error message saying the connection to the database was lost.
Upvotes: 3
Views: 4048
Reputation: 107587
Consider a CSV export from R for an import into MySQL with LOAD DATA INFILE
:
...
write.csv(df, "/path/to/filename.csv", row.names=FALSE)
dbSendQuery(con, "LOAD DATA LOCAL INFILE '/path/to/filename.csv'
INTO TABLE mytable
FIELDS TERMINATED by ','
ENCLOSED BY '"'
LINES TERMINATED BY '\\n'")
Upvotes: 4
Reputation: 9295
You could try to disable the mysql query log:
dbSendQuery(con, "SET GLOBAL general_log = 'off'")
I can't tell if your mysql user account has the appropriate permissions to do that, or if it conflicts with your business needs.
Off the top of my head: Otherwise you could try to send the data in say 1000-row batches, using a for- loop in your Rscript, and maybe option verbose = true
in your call to sqlSave
If you send the data in a single batch, Mysql might try to run the INSERT as a single transaction ("all-or-nothing") and if it fails it goes into recovery or just fails after inserting some random number of rows.
Upvotes: 0