Sahara
Sahara

Reputation: 11

R text mining documents from CSV file

First of all, my apology to repeat a question that was asked Aug 1 '13. But I cannot comment to the original question as I must have 50 reputation to be able to comment which I dont have. The original question can be retrieved from R text mining documents from CSV file (one row per doc) .

I am trying to work with the tm package in R, and have a CSV file of article abstracts with each line being a different abstract. I want each line to be a different document within the corpus. There are 2,000 rows in my data set.

I run the following codes as previously suggested by Ben:

# change this file location to suit your machine
file_loc <- "C:/Users/.../docs.csv"
# change TRUE to FALSE if you have no column headings in the CSV
x <- read.csv(file_loc, header = TRUE)
require(tm)
corp <- Corpus(DataframeSource(x))
docs <- DocumentTermMatrix(corp)

When I check class:

# checking class
class(docs)
[1] "DocumentTermMatrix"    "simple_triplet_matrix" 

The problem is tm transformations do not work on this class:

# Preparing the Corpus
# Simple Transforms
toSpace <- content_transformer(function(x, pattern) gsub(pattern, " ", x))
docs <- tm_map(docs, toSpace, "/")

I get this error:

Error in UseMethod("tm_map", x) : 
no applicable method for 'tm_map' applied to an object of class "c('DocumentTermMatrix', 'simple_triplet_matrix')"

or another code:

docs <- tm_map(docs, toSpace, "/|@|nn|")

I get the same error:

Error in UseMethod("tm_map", x) : 
no applicable method for 'tm_map' applied to an object of class "c('DocumentTermMatrix', 'simple_triplet_matrix')"

Your help would be greatly appreciated.

Upvotes: 1

Views: 3320

Answers (1)

Sahara
Sahara

Reputation: 11

The code

docs <- tm_map(docs, toSpace, "/|@|nn|")

must be replaced with

docs <- tm_map(docs, toSpace, "/|@|\\|").

Then it will work fine.

Upvotes: 0

Related Questions