Marta Karas
Marta Karas

Reputation: 5165

R build TermDocumentMatrix with removeSparseTerms parameter

Am I able to remove sparse terms WHILE creating a tm::TermDocumentMatrix object?

I tried:

TermDocumentMatrix(file.corp, control = list(removeSparseTerms=0.998))

but it does not work.

Upvotes: 1

Views: 757

Answers (1)

Ben
Ben

Reputation: 42293

No, you cannot remove sparse terms like that with the TermDocumentMatrix function. If you check the help for that function with ?TermDocumentMatrix you'll see that the options for control are listed in the help for termFreq, and when you look at the help for that function with ?termFreq, you'll see that removeSparseTerms is not listed there. Although you have bounds which can do a related job.

If you just want a one-liner that combines TermDocumentMatrix and removeSparseTerms, you simply flip your line inside-out and that will work fine:

removeSparseTerms(TermDocumentMatrix(file.corp), 0.998)

I recommend you have a careful look at the documentation for the tm package, it's one of better examples of a well-documented contributed package. It might save you time waiting for someone to answer your questions here!

Upvotes: 1

Related Questions