NCC1701
NCC1701

Reputation: 149

tolower () does not work for a Turkish character

"İ" is a special and frequently used letter for Turkish.
I have data of multiple text types. The messages contain the following letters, which are translated versions of Turkish characters, with a mixed case. I have to shrink and transform to make the characters uniform. However, when you change the letter "İ" to lowercase, it becomes "I". That's why the code below can't find it.

fromDB$message<-mgsub(fromDB$message,c("ğ","ö","ş","ü","ç","ı"),c("g","o","s","u","c","i"),useBytes = FALSE)

As an example of messages, you can see the "x" below. Why can't the character in the example below be converted to lowercase by tolower()?

My current setting is Sys.setlocale ("LC_CTYPE", "turkish").

x<-c("Sn. İLETİŞİM BİLGİLERİNİZ GUNCELLENMISTIR.")
x<-tolower(x)
x
[1] "sn. İletİşİm bİlgİlerİnİz guncellenmistir."

Let me add it as a picture. Because it may not be the same on every computer.

tolower_my_computer

A solution is proposed by @drammock at the address below.

Decapitalize UTF-8 special characters in R

I tried, but this time, the letters "I" in the last word turned into wrong, as shown below.

library(stringi)
x<-c("Sn. İLETİŞİM BİLGİLERİNİZ GUNCELLENMISTIR.")
x<-stri_trans_tolower(x, locale="tr_TR")
x
[1] "sn. iletişim bilgileriniz guncellenmıstır."

It may be useful to add it as a picture again.

Expected output: "sn. iletisim bilgileriniz guncellenmistir."

enter image description here

Upvotes: 1

Views: 66

Answers (0)

Related Questions