Reputation: 11
Dear stack overflow community,
I have an issue when trying to complete a stemmed Corpus in R using the function stemCompletion within the tm package (https://cran.r-project.org/web/packages/tm/tm.pdf).
In the past, I have succesfully used this function. Yet, it is now also no longer working with my past datasets.
As input to the function I use a preprocessed VCorpus. Everything works (tolower, removePunctuation, stripWhitespace, removeNumber, removeWords, stemDocumt) fine until the step of stemCompletion.
Here is part of the code I use:
# Load Data
Data <-read.csv("AMAZON_FASHION_5.csv", header=TRUE, sep = ";", dec = ",", colClasses = "character", fill = TRUE)
# View Data
View(Data)
Data <-data.frame(Data)
# Define Corpus
Data$reviewText <- iconv(Daten$reviewText, "WINDOWS-1252", sub="byte")
Text <- VCorpus(VectorSource(Daten$reviewText))
writeLines(strwrap(as.character(Text[[69]])))
#Output:
#was terribly disappointed the pants were way too large in the legs my husband looked
#like he was wearing blown up clown pants
###then some code to preprocess the data is performed
# Create a PlainTextDocument
Text <- tm_map(Text, PlainTextDocument)
# Create a copy of object "Text" to use later as a dictionary for stemming completion
Text.copy <- Text
# Stem document
Text_stemmed <- tm_map(Text, stemDocument, language = "english")
# Show comment Nr.69
writeLines(strwrap(as.character(Text_stemmed[[69]])))
#Output:
#terribl disappoint pant way larg leg husband look like wear blown clown pant
Text_comp <- stemCompletion(Text_stemmed, dictionary=Text.copy, type = "prevalent")
# Show comment Nr.69
writeLines(strwrap(as.character(Text_comp[[69]])))
#Output:
#character(0)
Can anybody help? What might be the issue here?
I have tried to run the stemCompletion without performing the operation PlainTextDocument before. Yet, this resulted in the following output:
writeLines(strwrap(as.character(Text_comp[[69]])))
69
Somehow the stemCompletion function seems to result in a character class, since I cannot call the functions
meta(Text_comp)
inspect(Text_comp)
on this object:
Error in UseMethod("meta", x) :
no applicable method for 'meta' applied to an object of class "character"
I also tried using the modified stemCompletion approach suggested by Zhao (https://drive.google.com/file/d/1JSlWQLPrAUrtdLrGFuS8kckxhqHp885f/view; (also mentioned in this stack overflow post: Issue with stemCompletion of Corpus for text mining in R (tm package)). Yet, this also did not lead to the desired results.
Upvotes: 1
Views: 26