Reputation: 513
Serbian alphabet has 5 additional letters (š, đ, ž, č, ć) on top of English alphabet. The problem is R won’t recognize č and ć. Characters š, đ, and ž work fine, but whenever I try to use č and ć, R interprets them as c.
>š
Error: object 'š' not found
>ž
Error: object 'ž' not found
>đ
Error: object 'd' not found
>č
function (..., recursive = FALSE) .Primitive("c")
>ć
function (..., recursive = FALSE) .Primitive("c")
When I read in files into R, it always substitutes č and ć with c.
Is there any way around this?
>Sys.getlocale()
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
Upvotes: 4
Views: 1291
Reputation: 4638
changing system locale to the specific language probably helps. Using "UTF-8" format should preserve the special characters When you read
read.table("file.txt",encoding="UTF-8")
If you are writing a file, you can do something like this
con <- file("path/filename.txt", encoding = "UTF-8")
write(x, file = con)
Upvotes: 1