Reputation: 145
I'm working on a dataset of locations where some location names use local characters. Most characters are viewed correctly, but I'm having issues with some Romanian characters, like for example "ș".
I have tried changing my Windows 10 64 bit system locale to use UTF-8 encoding, but that did not solve the issue.
A sample file can be found here for testing: https://drive.google.com/file/d/1T7QQQ7G_dA_rXD9Ewf51uuQ6CUkscjP_/view?usp=sharing
This line imports the data:
df <- read.delim("R_Encode_Issue.csv", header=TRUE, sep=",", encoding = "UTF-8", colClasses=c("character","character","character"))
> df
region country chapter
1 Europe Moldova Chi<U+0219>inau
This displays the location chapter as "Chiinau" (Stackoverflow can't displays this even :D) both in the console and in the viewer.
If I convert the data_table to a tibble:
df2 <- as_tibble(df)
> df2
# A tibble: 1 x 3
region country chapter
<chr> <chr> <chr>
1 Europe Moldova Chișinău
The console displays the location chapter as "Chișinău" but the viewer as "Chiinau".
I write the data to a .csv file:
write.csv(df2, file = "R_Encode_Out.csv",row.names=FALSE, na="", fileEncoding = "UTF-8")
And the location chapter is written as "Chiinau" in the written file.
R version:
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 5.3
year 2019
month 03
day 11
svn rev 76217
language R
version.string R version 3.5.3 (2019-03-11)
nickname Great Truth
RStudio version:
$mode
[1] "desktop"
$version
[1] ‘1.1.463’
I expected the viewer, or at least the written file to display the characters correctly, when I use UTF-8 as the encoding on import and export. But the case is that the characters are exported incorrectly.
Any insight on what I can do to correct this?
Upvotes: 0
Views: 745
Reputation: 5138
Try using a different import and export functions than base R. I got this to work using readr
in the exported file (it seems that viewer does display it as Chi<U+0219>inau
. The exported file opens correctly in notepad, and in Excel if I specify that it has UTF-8 encoding.
library(readr)
df <- read_csv("C:/Users/Andrew/Downloads/R_Encode_Issue.csv", locale = locale(encoding = "UTF-8"))
df
# A tibble: 1 x 3
region country chapter
<chr> <chr> <chr>
1 Europe Moldova Chișinău
write_csv(df, "C:/Users/Andrew/Desktop/R_Encode_Issue.csv")
Upvotes: 2