Reputation: 376
I have been trying to work through the following tutorial: http://www.rdatamining.com/examples/text-mining however, instead of using the twitter data I have been using .csv file (unfortunately the contents are sensitive and cannot be made public).
The .csv file has two columns a user key in column A and a piece of narrative text (Response) in column B. The file has been opened with the following code,
Data <- read.csv(file="PATH/FILE.csv", header=TRUE, sep=",", stringsAsFactors=FALSE)
Data <- Data[!(Data$Response==""), ]
df<- do.call("rbind", lapply(Data$Response, as.list))
df is a 'list of 91' with each item in the list being of type "character".
The tutorial is followed from the line library(tm) with no differences except the addition of NarrativeCorpus <- tm_map(NarrativeCorpus, PlainTextDocument)
after myCorpus <- tm_map(myCorpus, removeWords, myStopwords)
, which I found was needed for stemming.
The code fails at stem completion: myCorpus <- tm_map(myCorpus, stemCompletion, dictionary=dictCorpus)
with the error,
Error in grep(sprintf("^%s", w), dictionary, value = TRUE) : invalid regular expression, reason 'Out of memory'
I have tried to look on-line and on stack overflow with little luck.
I have tried converting the reference dictionary into a list of unique words then back into a corpus (to reduce its size) but to no avail.
I am using R 64-bit 3.2.3 with RStudio Desktop 0.99.891 on a Windows 7 laptop with 4GB RAM. All packages are up to date (according to RStudio).
This is my first SO post so I welcome advise on what I should have included and why, etc..
Upvotes: 1
Views: 1663
Reputation: 666
I had the similar issue, Error in grep(sprintf("^%s", w), dictionary, value = TRUE) : invalid regular expression
and after searching in SO, I found the solution in this thread which was found from this website.
This code should be added after loading your corpus:
content_transformer <- function(x) iconv(x, to='UTF-8-MAC', sub='byte')
myCorpus <- tm_map(myCorpus, content_transformer)
Good luck
Upvotes: 0