Mace
Mace

Reputation: 1269

UTF-8 file encoding in R

I have a .csv file which should be in 'UTF-8' encoding. I have exported it from Sql Server Management Studio. However, when importing it to R it fails on the lines with ÿ. I use read.csv2 and specify file encoding "UTF-8-BOM".

Notepad++ correctly displays the ÿ and says it is UTF-8 encoding. Is this a bug with the R encoding, or is ÿ in fact not part of the UTF-8 encoding scheme?

I have uploaded a small tab delimited .txt file that fails here: https://www.dropbox.com/s/i2d5yj8sv299bsu/TestData.txt

Thanks

Upvotes: 1

Views: 1081

Answers (1)

aled
aled

Reputation: 25837

That is probably part of the BOM marker at the beginning. If the editor or parser doesn't recognize BOM markers it believes it is garbage. See https://www.ultraedit.com/support/tutorials-power-tips/ultraedit/unicode.html for more details.

Upvotes: 0

Related Questions