Reputation: 1736
When I use DocumentTermMatrix on my corpus, it lowercases the words. I'd like to preserve the camel case. How do I do it?
as.matrix(DocumentTermMatrix(Corpus(VectorSource(c("Hello", "World")))))
I'd like the column names to be Hello and World instead of hello and world.
Upvotes: 2
Views: 779
Reputation: 53
capitalize
function in library(Hmisc)
does the job for me as a beginner.
library(Hmisc)
terms <- as.matrix(DocumentTermMatrix(Corpus(VectorSource(c("Hello", "World")))))
colnames(terms) <- capitalize(colnames(terms))
terms
Terms
Docs Hello World
1 1 0
2 0 1
Upvotes: 0
Reputation: 23109
You can try the following hack:
words <- c("Hello", "World")
tdm <- as.data.frame(as.matrix(DocumentTermMatrix(Corpus(VectorSource(words)))))
names(tdm) <- sort(words) # need to sort alphabetically
tdm
# Hello World
#1 1 0
#2 0 1
Cleaner way to do the same:
words <- c("Hello", "World")
tdm <- as.data.frame(as.matrix(DocumentTermMatrix(Corpus(VectorSource(factor(words))),
control=list(tolower=FALSE))))
tdm
# Hello World
#1 1 0
#2 0 1
Upvotes: 2