user7426583
user7426583

Reputation:

RMySQL dbReadTable takes too long

I am using the package RMySQL with DBI package in R. When I run the code,

dbReadTable(con, "data") 

it is taking forever.

I think the table is very big data. Any ideas on how to speed up this process?

Thanks,

Upvotes: 1

Views: 831

Answers (1)

wibeasley
wibeasley

Reputation: 5287

Try to get the database to do as much filtering & processing as possible. A database has many more ways to optimize operations than R, and isn't constrained by RAM so severely. It also reduces the amount that has to travel across the network.

Common approaches tactics are

  • using the WHERE clause to reduce rows
  • explicitly list (only the necessary) columns, instead of using *
  • do as much aggregation in SQL as possible (eg, GROUP BY + MAX)
  • use INSERT queries to write from table to table, so the data doesn't even need to pass through R.

I imagine RMySQL should be faster than the newish odbc package, but it's worth experimenting with.

What's 'forever'? 5 min or 5 hours? Are things still slow once the data get to R? If things are still too slow to be feasible, consider escalating to something like sparklyr.

Upvotes: 1

Related Questions