u.ahmed
u.ahmed

Reputation: 1

Problem in printing Russian characters in rendering .docx in Rmarkdown

I am trying to render a .docx file using Rmarkdown that includes Russian characters in the dataframe

---
params: 
title: "Encoding Issue"
output:
 bookdown::word_document2:
   reference_docx: Word_template.docx
classoption: a4paper
always_allow_html: yes
lang: ru
---

```{r data}
df <- data.frame(x = 
c("Не хватка медикаментов",
"Далеко ехать",
"Опасность на дорогах к мед.учереждению",
"Only one family physician")
)
Encoding(df$x) <- "UTF-8"
```

```{r cat}
cat(df$x)
```

�� ������ ������������ ������ ����� ��������� �� ������� � ���.����������� Only one family physician

```{r print}
print(df$x)
```

[1] "\xcd\xe5 \xf5\xe2\xe0\xf2\xea\xe0 \xec\xe5\xe4\xe8\xea\xe0\xec\xe5\xed\xf2\xee\xe2"
[2] "\xc4\xe0\xeb\xe5\xea\xee \xe5\xf5\xe0\xf2\xfc"
[3] "\xce\xef\xe0\xf1\xed\xee\xf1\xf2\xfc \xed\xe0 \xe4\xee\xf0\xee\xe3\xe0\xf5 \xea \xec\xe5\xe4.\xf3\xf7\xe5\xf0\xe5\xe6\xe4\xe5\xed\xe8\xfe" [4] "Only one family physician"

The rendered .docx file shows the printed result as

[1] "<U+041D><U+0435> <U+0445><U+0432><U+0430><U+0442><U+043A><U+0430> <U+043C><U+0435><U+0434><U+0438><U+043A><U+0430><U+043C><U+0435><U+043D><U+0442><U+043E><U+0432>"
[2] "<U+0414><U+0430><U+043B><U+0435><U+043A><U+043E> <U+0435><U+0445><U+0430><U+0442><U+044C>"
[3] "<U+041E><U+043F><U+0430><U+0441><U+043D><U+043E><U+0441><U+0442><U+044C> <U+043D><U+0430> <U+0434><U+043E><U+0440><U+043E><U+0433><U+0430><U+0445> <U+043A> <U+043C><U+0435><U+0434>.<U+0443><U+0447><U+0435><U+0440><U+0435><U+0436><U+0434><U+0435><U+043D><U+0438><U+044E>" [4] "Only one family physician"

The Sys.getlocale() is

"LC_COLLATE=Russian_Russia.1251;LC_CTYPE=Russian_Russia.1251;LC_MONETARY=Russian_Russia.1251;LC_NUMERIC=C;LC_TIME=Russian_Russia.1251"

Where could be the origin of the encoding issue? Is there any way to correctly render the .docx file with the correct characters?

Не хватка медикаментов Далеко ехать Опасность на дорогах к мед.учереждению Only one family physician

I have also tried with Sys.setlocale("LC_CTYPE", "English"). The .docx template is set to "UTF-8". The rmarkdown is also set to options(encoding = "UTF-8").

Upvotes: 0

Views: 212

Answers (1)

manro
manro

Reputation: 3677

You should use enc2utf8 in this situation:

```{r data}
df <- data.frame(x = 
c("Нехватка медикаментов",
"Далеко ехать",
"Опасность на дорогах к мед.учреждению",
"Only one family physician")
)
```

```{r}
enc2utf8(df$x)
```

Upvotes: 0

Related Questions