user3746295
user3746295

Reputation: 101

R: Got problems in reading text file

I want to read text file in R. The code used to work. But when I want to retest it, it didn't.

#There are several text files in file'Obama' and file 'Romney'
candidates<-c("Obama","Romney")
pathname<-"C:/txt"
s.dir<-sprintf("%s/%s",pathname,candidates)
article<-Corpus(DirSource(directory=s.dir,encoding="ANSI"))

The error it displayed is

Error in iconv(readLines(x, warn = FALSE), encoding, "UTF-8", "byte") : 
unsupported conversion from 'ANSI' to 'UTF-8' in codepage 936

Also, when I use the code below to try to read a single text file:

m<-"C:/txt/Romney/1.txt"
cc<-Corpus(DirSource(directory=m,encoding="ANSI"))

It displayed:

Error in DirSource(directory = m, encoding = "ANSI") : empty directory

The file path exist, why I met this problem?

Upvotes: 1

Views: 4781

Answers (3)

Emeka
Emeka

Reputation: 11

s.cor <- Corpus(DirSource(directory = s.dir, encoding = "ANSI"))

I changed encoding="ANSI" to encoding="UTF-8". It worked.

s.cor <- Corpus(DirSource(directory = s.dir, encoding = "UTF-8"))

Upvotes: 0

Ajitesh
Ajitesh

Reputation: 1056

Following is what you needed to do:

  1. Change the article<-Corpus(DirSource(directory=s.dir,encoding="ANSI")) to following:

article <- VCorpus(DirSource(directory = s.dir), readerControl = list(reader=readPlain))

  1. In cleanCorpus function, change the corpus.tmp <- tm_map(corpus.tmp, tolower) to following:

corpus.tmp <- tm_map(corpus.tmp, content_transformer(tolower))

Pay attention to usage of "content_transformer" function.

Once done with above, you should be able to fix the problem.

Upvotes: 1

Kasper Christensen
Kasper Christensen

Reputation: 915

Go to "cran.r-project.org/web/packages/tm/index.html"; and download and install the old version of tm, and wait until the bug is fixed.

Upvotes: 0

Related Questions