hydradon
hydradon

Reputation: 1426

unable to read csv file saved with encoding "UTF-8-SIG"

I got data that I crawled using Scrapy, which saves as csv file with encoding utf-8-sig. The data has many different special characters: Korean, Russian, Chinese, Spanish,..., a star symbol (ā˜…), and this šŸŽµ, and this šŸŽ„...

So Scrapy can save, and I can view those on Notepad++ or app like CSVFileView. But when I load in R using mydata <- read.csv(<path_to_file>, fileEncoding="UTF-8-SIG", header=FALSE), I got this error:

Error in file(file, "rt", encoding = fileEncoding) : 
  unsupported conversion from 'UTF-8-SIG' to ''

If I don't specify the encoding, I can load but the symbols will become characters like Ć¢Ėœ and the first column head will be appended with ĆÆ..

Which encoding should I choose to include all characters?

Upvotes: 0

Views: 2746

Answers (1)

lroha
lroha

Reputation: 34441

As the input is already encoded as UTF-8, you should use the encoding argument to read the file as-is. Using fileEncoding will attempt to re-encode the file.

mydata <- read.csv(<path_to_file>, encoding="UTF-8", header=FALSE)

Upvotes: 1

Related Questions