Reputation: 13929
I have a document term matrix in cluto format:
#Document #Term #TotalItem
term-x weight-x term-y weight-y (for only nonzeros terms, a row per document)
Instead of a corpus, I want to create DocumentTermMatrix(tm package) from this file, is this possible?
Cluto File:
2 3 3
1 3 3 4
2 8
Row File:
car
plane
Column File:
x
y
z
Solution:
dtm = as.DocumentTermMatrix(read_stm_CLUTO(file), weightTf);
rows <- scan("rows.txt", what="", sep="\n");
columns <- scan("columns.txt", what="", sep="\n");
dtm$dimnames = list(rows,columns);
Upvotes: 0
Views: 410
Reputation: 42293
This should do it:
require(slam)
as.DocumentTermMatrix(read_stm_CLUTO(file), weightTf)
If you can link to your CLUTO file or an add an excerpt of it to your Q we can look at row and column names.
Upvotes: 1