Reputation: 540
I am having a trouble in the tm package of R. I am using 0.6.2 version. Following question (2 different errors) has already been answered here and here but still producing an error after using the posted solution. Please click here to download the dataset (93 rows only). It's a reproducible example. the two errors are below:
(Resolved) Error in UseMethod("meta", x) : no applicable method for 'meta' applied to an object of class "character"
Error: inherits(doc, "TextDocument") is not TRUE
please tell me what is wrong in my approach.
--
# Data import
df.imp<- read.csv("Phone2_Sample100_NegPos.csv", header = TRUE, as.is = TRUE)
##### Data Pre-Processing
install.packages("tm")
require(tm)
ds.corpus<- Corpus(VectorSource(df.imp$Content))
ds.corpus<- tm_map(ds.corpus, content_transformer(tolower))
ds.corpus<- tm_map(ds.corpus, content_transformer(removePunctuation))
ds.corpus<- tm_map(ds.corpus, content_transformer(removeNumbers))
removeURL<- function(x) gsub("http[[:alnum:]]*", "", x)
ds.corpus<- tm_map(ds.corpus,removeURL)
stopwords.default<- stopwords("english")
stopWordsNotDeleted<- c("isn't" , "aren't" , "wasn't" , "weren't" , "hasn't" ,
"haven't" , "hadn't" , "doesn't" , "don't" ,"didn't" ,
"won't" , "wouldn't", "shan't" , "shouldn't", "can't" ,
"cannot" , "couldn't" , "mustn't", "but","no", "nor", "not", "too", "very")
stopWord.new<- stopwords.default[! stopwords.default %in% stopWordsNotDeleted] ## new Stopwords list
ds.corpus<- tm_map(ds.corpus, removeWords, stopWord.new )
copy<- ds.corpus ## creating a copy to be used as a dictionary
ds.corpus<- tm_map(ds.corpus, stemDocument)
## error Statement #1
ds.corpus<- stemCompletion(ds.corpus, dictionary = copy)
## Error in UseMethod("meta", x) : no applicable method for 'meta' applied to an object of class "character"
ds.cleanCorpus<- tm_map(ds.corpus, PlainTextDocument) ## creating plain text document
class(ds.cleanCorpus) ## output is VCorpus" "Corpus". what it should be??
## error Statement #2
tdm<- TermDocumentMatrix(ds.corpus) ## creating term document matrix
inherits(ds.cleanCorpus, "TextDocument") ## returns FALSE
Update: Figured out first error, that the stemCompletion method's x parameter should be a character vector and dictionary could be either a corpus or character vector. However, when I tried it on first document (character vector) of ds.corpus, as below, stemmed words were not completed and output is just the stemmed character vector like before.
stemCompletion(ds.corpus[[1]]$content, dictionary = copy)
So now my main question is "How to complete a stemmed corpus from a dictionary (tm package)?" The stemCompletion method doesn't seems working (on a character vector). Secondly, how can I complete the stemming of an entire corpus, should I use a for loop for each document of the corpus's content?
Upvotes: 3
Views: 6117
Reputation: 33
not sure if you have found the solution already. I have been informed by this post stemCompletion is not working and I believe it solves your second questions of "How to complete a stemmed corpus from a dictionary (tm package)?" (as well as mine, which is similar to yours). Specifically, you can try the following code:
stem_completion <- tm_map(ds.corpus,
content_transformer(function(x, d)
paste(stemCompletion(strsplit(stemDocument(x), ' ')[[1]], d),
collapse = ' ')), d = copy)
Upvotes: 1
Reputation: 313
There are 2 things you need to change
When you use a custom function you need to use content_transformer
removeURL<- function(x) gsub("http[[:alnum:]]*", "", x)
ds.corpus<- tm_map(ds.corpus,content_transformer(removeURL))
The purpose of the function stemCompletion is to try to complete a stemmed word https://en.wikipedia.org/wiki/Stemming based on a dictionary. The stemmed words need to be a character vector and dictionary can be a corpus.
x <- c("compan", "entit", "suppl") stemCompletion(x, copy)
output:
compan entit suppl
"companies" "" "supplies"
Code to create Document Term Matrix
# Data import
df.imp<- read.csv("data/Phone2_Sample100_NegPos.csv", header = TRUE, as.is = TRUE)
##### Data Pre-Processing
#install.packages("tm")
require(tm)
ds.corpus<- Corpus(VectorSource(df.imp$Content))
ds.corpus<- tm_map(ds.corpus, content_transformer(tolower))
ds.corpus<- tm_map(ds.corpus, content_transformer(removePunctuation))
ds.corpus<- tm_map(ds.corpus, content_transformer(removeNumbers))
removeURL<- function(x) gsub("http[[:alnum:]]*", "", x)
ds.corpus<- tm_map(ds.corpus,content_transformer(removeURL))
stopwords.default<- stopwords("english")
stopWordsNotDeleted<- c("isn't" , "aren't" , "wasn't" , "weren't" , "hasn't" ,
"haven't" , "hadn't" , "doesn't" , "don't" ,"didn't" ,
"won't" , "wouldn't", "shan't" , "shouldn't", "can't" ,
"cannot" , "couldn't" , "mustn't", "but","no", "nor", "not", "too", "very")
stopWord.new<- stopwords.default[! stopwords.default %in% stopWordsNotDeleted] ## new Stopwords list
ds.corpus<- tm_map(ds.corpus, removeWords, stopWord.new )
tdm<- TermDocumentMatrix(ds.corpus)
copy<- ds.corpus ## creating a copy to be used as a dictionary
x <- c("compan", "entit", "suppl")
stemCompletion(x, copy)
Upvotes: 3