Reputation: 328
I have been unable to work in R given how slow it is operating once my datasets are loaded. These datasets total around 8GB. I am running on a 8GB RAM and have adjusted memory.limit
to exceed my RAM but nothing seems to be working. Also, I have used fread
from the data.table
package to read these files; simply because read.table
would not run.
After seeing a similar post on the forum addressing the same issue, I have attempted to run gctorture()
, but to no avail.
R is running so slowly that I cannot even check the length of the list of datasets I have uploaded, cannot View
or do any basic operation once these datasets are uploaded.
I have tried uploading the datasets in 'pieces', so 1/3 of the total files over 3 times, which seemed to make things run more smoothly for the importing part, but has not changed anything with regards to how slow R runs after this.
Is there any way to get around this issue? Any help would be much appreciated.
Thank you all for your time.
Upvotes: 0
Views: 4287
Reputation: 415
The problem arises because R loads the full dataset into the RAM which mostly brings the system to a halt when you try to View
your data.
If it's a really huge dataset, first make sure the data contains only the most important columns and rows. Valid columns can be identified through the domain and world knowledge you have about the problem. You can also try to eliminate rows with missing values.
Once this is done, depending on your size of the data, you can try different approaches. One is through the use of packages like bigmemory
and ff
. bigmemory
for example, creates a pointer object using which you can read the data from disk without loading it to the memory.
Another approach is through parallelism (implicit or explicit). MapReduce
is another package which is very useful for handling big datasets.
For more information on these, check out this blog post on rpubs and this old but gold post from SO.
Upvotes: 4