Reputation: 1426
I got data that I crawled using Scrapy, which saves as csv file with encoding utf-8-sig
. The data has many different special characters: Korean, Russian, Chinese, Spanish,..., a star symbol (ā
), and this šµ, and this š...
So Scrapy can save, and I can view those on Notepad++ or app like CSVFileView. But when I load in R using mydata <- read.csv(<path_to_file>, fileEncoding="UTF-8-SIG", header=FALSE)
, I got this error:
Error in file(file, "rt", encoding = fileEncoding) :
unsupported conversion from 'UTF-8-SIG' to ''
If I don't specify the encoding, I can load but the symbols will become characters like Ć¢Ė
and the first column head will be appended with ĆÆ..
Which encoding should I choose to include all characters?
Upvotes: 0
Views: 2746
Reputation: 34441
As the input is already encoded as UTF-8
, you should use the encoding
argument to read the file as-is. Using fileEncoding
will attempt to re-encode the file.
mydata <- read.csv(<path_to_file>, encoding="UTF-8", header=FALSE)
Upvotes: 1