magasr
magasr

Reputation: 513

R doesn't accept certain Serbian characters with diacritics (č, ć)

Serbian alphabet has 5 additional letters (š, đ, ž, č, ć) on top of English alphabet. The problem is R won’t recognize č and ć. Characters š, đ, and ž work fine, but whenever I try to use č and ć, R interprets them as c.

>š
Error: object 'š' not found
>ž
Error: object 'ž' not found
>đ
Error: object 'd' not found
>č
function (..., recursive = FALSE)  .Primitive("c")
>ć
function (..., recursive = FALSE)  .Primitive("c")

When I read in files into R, it always substitutes č and ć with c.

Is there any way around this?

>Sys.getlocale()
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"

Upvotes: 4

Views: 1291

Answers (1)

user5249203
user5249203

Reputation: 4638

changing system locale to the specific language probably helps. Using "UTF-8" format should preserve the special characters When you read

  read.table("file.txt",encoding="UTF-8")

If you are writing a file, you can do something like this

  con <- file("path/filename.txt", encoding = "UTF-8")
  write(x, file = con)

Upvotes: 1

Related Questions