Reputation: 1269
I am trying to read a csv
file generated by Sql Server Management Studio and encoded as UTF-8
(I chose that option when saving it) into R
version 3.0.1 (x64) through read.csv2()
. I can't get R to display special characters correctly.
If I set fileEncoding="UTF-8-BOM"
the import stops at the line where I have a ÿ. However, when opening the file in Notepad++
the ÿ is displayed correctly with UTF-8
encoding. I have tried without setting fileEncoding
, but then the special characters aren't displayed correctly (of course).
The csv flie is available here: https://www.dropbox.com/s/7y47i826ikq8ahi/Data.csv
How do I read the csv file and display the text in the right encoding?
Thanks!!
Upvotes: 4
Views: 20657
Reputation: 5250
In my case, I have this issue in R inside a docker container (debian and R), when I ran locale
in the container all variables appeared empty. I solve the problem adding this in the Dockerfile.
ENV LANG=en_US.UTF-8
ENV LC_CTYPE=en_US.UTF-8
ENV LC_NUMERIC=es_AR.UTF-8
ENV LC_TIME=es_AR.UTF-8
ENV LC_COLLATE=en_US.UTF-8
ENV LC_MONETARY=es_AR.UTF-8
ENV LC_MESSAGES=en_US.UTF-8
ENV LC_PAPER=es_AR.UTF-8
ENV LC_NAME=es_AR.UTF-8
ENV LC_ADDRESS=es_AR.UTF-8
ENV LC_TELEPHONE=es_AR.UTF-8
ENV LC_MEASUREMENT=es_AR.UTF-8
ENV LC_IDENTIFICATION=es_AR.UTF-8
ENV LC_ALL=C.UTF-8
I have es_AR
in some values, but I think en_US
or other should work.
Upvotes: 0
Reputation: 10152
To those that are still stuck with this issue. My scripts were able to recognise "umlaute" (ä, ö, ü, or ß) by including a line at the top of the script that changes the default option for character encoding options(encoding = "UTF-8")
(In my case setting the options in RStudio direclty didn't effect the encodings!).
Upvotes: 2
Reputation: 1269
I found the answer my self. The problem was with the transformantion from UTF-8 to the system locale (the default encoding in R) through fileEncoding
. As I use RStudio
, I just changed the default encoding to UTF-8 and removed the fileEncoding="UTF-8-BOM"
from read.csv
. Then, the entire csv file was read and RStudio displays all characters correctly.
Upvotes: 5