Reputation: 2158
I am trying to play with text mining tools that the R language offers but I am facing the following problem since I am running in an old machine.
I want to create a Document Term Matrix using the tm package and the Corpus function. When I create the DTM I receive an error that can allocate memory of 4GB (My machine has 2 GB of memory). How in general do you face such a problem? For example, in general applications the DTM should be much greater than my matrix. Is there a way to use an SQL database instead of using the memory?
//I have studied a releated post about using the sqldf library in order to create a temporary sqlite database. But in this case I can not even create the matrix.
Upvotes: 0
Views: 253
Reputation: 363817
How in general do you face such a problem?
Use a sparse matrix data structure. Without that, text mining is pretty much impossible. With one, I can process 100s of 1000s of document in a few hundred MB.
I don't work in R myself, but it's bound to have a sparse matrix package somewhere.
Upvotes: 4