User1811
User1811

Reputation: 9

How to make Cyrillic datasets recognizable?

One dataset from Russian Election studies is written in Cyrillic letters, but R can not recognize Cyrillic letters and shows instead some weird symbols, if I use view(rusdata) and I would like to convert this dataset in a way, that the Cyrillic letters are recognized by R.

Here is what I already tried and what didn't help me:

rusdata <-read.spss("RES 2007-2008.sav", to.data.frame = TRUE)

Sys.setlocale(locale = "Russian")
view(rusdata)
Sys.setlocale(,"ru_RU")
view(rusdata)
Sys.setlocale("LC_CTYPE", "russian")
encoding = "utf-8"
view(rusdata)
Sys.setlocale("LC_CTYPE", "ru_RU.UTF-8")
view(rusdata)

I would really welcome your help!

Upvotes: 0

Views: 284

Answers (1)

Jakub.Novotny
Jakub.Novotny

Reputation: 3047

You can try your luck with a different package such as haven::read_sav. Alternatively, I would use stringi::stri_enc_detect to detect encoding first. I assume you can copy a part of the text from spss. Here is an example:

a <- "Статья 1;Все люди рождаются"
stringi::stri_enc_detect(a)

Then I would use the encoding returned by stri_enc_detect:

rusdata <-read.spss("RES 2007-2008.sav", to.data.frame = TRUE, reencode = "encoding goes here")

Upvotes: 1

Related Questions