Mars_Tina
Mars_Tina

Reputation: 31

TM Package: Error in UseMethod("TermDocumentMatrix", x)

I want to plot a term-document matrix like Figure 6 in the JSS article on TM package 1The article link: https://www.jstatsoft.org/article/view/v025i05

My corpus Speach-English.txt is in here: https://github.com/yushu-liu/speach-english.git

The figure should look like as follow:

enter image description here

Here is my code:

library(tm)
library(stringr)
library(wordcloud)

text <- paste(readLines("D:/Rdata/speach-English.txt"), collapse = " ")
text_tidy <- gsub(pattern = "\\W",replace=" ",text)
text_tidy2 <- gsub(pattern = "\\d",replace=" ",text_tidy)

text_tidy2 <- tolower(text_tidy2)
text_tidy2 <- removeWords(text_tidy2,stopwords())
text_tidy2 <- gsub(pattern = "\\b[A-z]\\b{1}",replace=" ", text_tidy2 )
text_tidy2 <- stripWhitespace(text_tidy2)

textbag <- str_split(text_tidy2,pattern = "\\s+")
textbag <- unlist(textbag)

tdm <- TermDocumentMatrix(textbag, control = list(removePunctuation = TRUE,
                                                removeNumbers = TRUE,
                                                stopwords = TRUE))

plot(tdm, terms = findFreqTerms(tdm, lowfreq = 6)[1:25], corThreshold = 0.5)

But one bug came out:

Error in UseMethod("TermDocumentMatrix", x) : 
  no applicable method for 'TermDocumentMatrix' applied to an object of class "character"

Why? Thanks!

Upvotes: 0

Views: 3772

Answers (1)

Manuel Bickel
Manuel Bickel

Reputation: 2206

The problem is that you have not created an object of the Corpus class, which is the type of object you need to feed to TermDocumentMatrix(). See an example of how you could do that below.

Another point I would like to note is that in your line str_split(text_tidy2,pattern = "\\s+") you split your text into unigrams (individual terms). Hence, you only get documents of one term each. Creating a tdm from this structure does not make much sense. What is the intended purpose of this line? Maybe I can point you to what you want.

library(tm)
text <-  readLines("https://raw.githubusercontent.com/yushu-liu/speach-english/master/speach-English.txt")
#first define the type of source you want to use and how it shall be read
x <- VectorSource(text)
#create a corpus object
x <- VCorpus(x)
#feed it to tdm
tdm <- TermDocumentMatrix(x)
tdm
#<<TermDocumentMatrix (terms: 4159, documents: 573)>>
#Non-/sparse entries: 14481/2368626
#Sparsity           : 99%
#Maximal term length: 21
#Weighting          : term frequency (tf)

Upvotes: 3

Related Questions