Reputation: 101
I want to read text file in R. The code used to work. But when I want to retest it, it didn't.
#There are several text files in file'Obama' and file 'Romney'
candidates<-c("Obama","Romney")
pathname<-"C:/txt"
s.dir<-sprintf("%s/%s",pathname,candidates)
article<-Corpus(DirSource(directory=s.dir,encoding="ANSI"))
The error it displayed is
Error in iconv(readLines(x, warn = FALSE), encoding, "UTF-8", "byte") :
unsupported conversion from 'ANSI' to 'UTF-8' in codepage 936
Also, when I use the code below to try to read a single text file:
m<-"C:/txt/Romney/1.txt"
cc<-Corpus(DirSource(directory=m,encoding="ANSI"))
It displayed:
Error in DirSource(directory = m, encoding = "ANSI") : empty directory
The file path exist, why I met this problem?
Upvotes: 1
Views: 4781
Reputation: 11
s.cor <- Corpus(DirSource(directory = s.dir, encoding = "ANSI"))
I changed encoding="ANSI" to encoding="UTF-8". It worked.
s.cor <- Corpus(DirSource(directory = s.dir, encoding = "UTF-8"))
Upvotes: 0
Reputation: 1056
Following is what you needed to do:
article <- VCorpus(DirSource(directory = s.dir), readerControl = list(reader=readPlain))
corpus.tmp <- tm_map(corpus.tmp, content_transformer(tolower))
Pay attention to usage of "content_transformer" function.
Once done with above, you should be able to fix the problem.
Upvotes: 1
Reputation: 915
Go to "cran.r-project.org/web/packages/tm/index.html"; and download and install the old version of tm, and wait until the bug is fixed.
Upvotes: 0