Chris Chung
Chris Chung

Reputation: 1

Chinese Text Mining

I used Chinese word segment to do Text Mining. And I changed data type to dataframe had comma and double quotation mark. So the wordcloud is strange. Like this: strange wordcloud

My syntax as below: inspect(d.corpus)

inspect(d.corpus) pic

d.corpus <- Corpus(DataframeSource(data.frame(as.character(d.corpus))))
tdm <- TermDocumentMatrix(d.corpus, control = list(wordLengths = c(2, Inf)))
m1 <- as.matrix(tdm)
v <- sort(rowSums(m1), decreasing = TRUE)
d <- data.frame(word = names(v), freq = v)
wordcloud(d$word, d$freq, min.freq = 5, random.order = F, ordered.colors = F, 
    colors = rainbow(length(row.names(m1))))

How can I modify data?

I tried to split the syntax:

d.corpus <- Corpus(DataframeSource(data.frame(as.character(d.corpus)))).

Why as.character(d.corpus) has 3rows?

test1 <- as.character(d.corpus)

Upvotes: 0

Views: 1297

Answers (1)

Chris Chung
Chris Chung

Reputation: 1

I found it that I used for loop edit names(v) data.

for (i in 1:length(names(v)))
{
    names(v)[i] <- gsub('[\",]','',names(v)[i])
}

result

Upvotes: 0

Related Questions