Reputation: 11
First of all, my apology to repeat a question that was asked Aug 1 '13. But I cannot comment to the original question as I must have 50 reputation to be able to comment which I dont have. The original question can be retrieved from R text mining documents from CSV file (one row per doc) .
I am trying to work with the tm package in R, and have a CSV file of article abstracts with each line being a different abstract. I want each line to be a different document within the corpus. There are 2,000 rows in my data set.
I run the following codes as previously suggested by Ben:
# change this file location to suit your machine
file_loc <- "C:/Users/.../docs.csv"
# change TRUE to FALSE if you have no column headings in the CSV
x <- read.csv(file_loc, header = TRUE)
require(tm)
corp <- Corpus(DataframeSource(x))
docs <- DocumentTermMatrix(corp)
When I check class:
# checking class
class(docs)
[1] "DocumentTermMatrix" "simple_triplet_matrix"
The problem is tm transformations do not work on this class:
# Preparing the Corpus
# Simple Transforms
toSpace <- content_transformer(function(x, pattern) gsub(pattern, " ", x))
docs <- tm_map(docs, toSpace, "/")
I get this error:
Error in UseMethod("tm_map", x) :
no applicable method for 'tm_map' applied to an object of class "c('DocumentTermMatrix', 'simple_triplet_matrix')"
or another code:
docs <- tm_map(docs, toSpace, "/|@|nn|")
I get the same error:
Error in UseMethod("tm_map", x) :
no applicable method for 'tm_map' applied to an object of class "c('DocumentTermMatrix', 'simple_triplet_matrix')"
Your help would be greatly appreciated.
Upvotes: 1
Views: 3320
Reputation: 11
The code
docs <- tm_map(docs, toSpace, "/|@|nn|")
must be replaced with
docs <- tm_map(docs, toSpace, "/|@|\\|").
Then it will work fine.
Upvotes: 0