Reputation:
I am using the package RMySQL with DBI package in R. When I run the code,
dbReadTable(con, "data")
it is taking forever.
I think the table is very big data. Any ideas on how to speed up this process?
Thanks,
Upvotes: 1
Views: 831
Reputation: 5287
Try to get the database to do as much filtering & processing as possible. A database has many more ways to optimize operations than R, and isn't constrained by RAM so severely. It also reduces the amount that has to travel across the network.
Common approaches tactics are
WHERE
clause to reduce rows*
GROUP BY
+ MAX
)INSERT
queries to write from table to table, so the data doesn't even need to pass through R.I imagine RMySQL should be faster than the newish odbc
package, but it's worth experimenting with.
What's 'forever'? 5 min or 5 hours? Are things still slow once the data get to R? If things are still too slow to be feasible, consider escalating to something like sparklyr.
Upvotes: 1