Reputation:
I've created a tdm matrix in R which I want to write to a file. This is a large sparse matrix in simple triplet form, ~ 20,000 x 10,000. When I convert it to a dense matrix to add columns by cbind, I get low memory errors and the process does not complete. I don't want to increase my RAM.
Also, I want to - - bind the tf and tfidf matrices together - save the sparse/dense matrix to csv - run batch machine learning algorithms such as J48 implementation of weka.
How do I save/ load dataset and run the batch ML algorithms within memory constraints?
If I can write a sparse matrix to a data store, can I run ml algorithms in R on a sparse matrix, and within memory constraints?
Upvotes: 4
Views: 1018
Reputation: 4180
A third solution, in addition to those mentioned by @djhurio, is to use cloud computing services, such as those provided by Amazon EC2. You don't mention how much RAM do you require exactly, but from I could quickly gather from the current price list, using these services will gain you up to 244 Gb of RAM. I doubt you'll need that much in reality, and if all you need is 16-32 Gb, the price will not be prohibitive at all.
If you are an academic user, you may want to look into RevoScaleR
in Revolution R, a commercial version of R which is available for free in the academic context. This software handles large objects out of the box.
Upvotes: 0