nikosdi
nikosdi

Reputation: 2158

Text Mining with too much data

I am trying to play with text mining tools that the R language offers but I am facing the following problem since I am running in an old machine.

I want to create a Document Term Matrix using the tm package and the Corpus function. When I create the DTM I receive an error that can allocate memory of 4GB (My machine has 2 GB of memory). How in general do you face such a problem? For example, in general applications the DTM should be much greater than my matrix. Is there a way to use an SQL database instead of using the memory?

//I have studied a releated post about using the sqldf library in order to create a temporary sqlite database. But in this case I can not even create the matrix.

Upvotes: 0

Views: 253

Answers (1)

Fred Foo
Fred Foo

Reputation: 363817

How in general do you face such a problem?

Use a sparse matrix data structure. Without that, text mining is pretty much impossible. With one, I can process 100s of 1000s of document in a few hundred MB.

I don't work in R myself, but it's bound to have a sparse matrix package somewhere.

Upvotes: 4

Related Questions