Reputation: 1
I am trying to render a .docx file using Rmarkdown that includes Russian characters in the dataframe
---
params:
title: "Encoding Issue"
output:
bookdown::word_document2:
reference_docx: Word_template.docx
classoption: a4paper
always_allow_html: yes
lang: ru
---
```{r data}
df <- data.frame(x =
c("Не хватка медикаментов",
"Далеко ехать",
"Опасность на дорогах к мед.учереждению",
"Only one family physician")
)
Encoding(df$x) <- "UTF-8"
```
```{r cat}
cat(df$x)
```
�� ������ ������������ ������ ����� ��������� �� ������� � ���.����������� Only one family physician
```{r print}
print(df$x)
```
[1] "\xcd\xe5 \xf5\xe2\xe0\xf2\xea\xe0 \xec\xe5\xe4\xe8\xea\xe0\xec\xe5\xed\xf2\xee\xe2"
[2] "\xc4\xe0\xeb\xe5\xea\xee \xe5\xf5\xe0\xf2\xfc"
[3] "\xce\xef\xe0\xf1\xed\xee\xf1\xf2\xfc \xed\xe0 \xe4\xee\xf0\xee\xe3\xe0\xf5 \xea \xec\xe5\xe4.\xf3\xf7\xe5\xf0\xe5\xe6\xe4\xe5\xed\xe8\xfe" [4] "Only one family physician"
The rendered .docx file shows the printed result as
[1] "<U+041D><U+0435> <U+0445><U+0432><U+0430><U+0442><U+043A><U+0430> <U+043C><U+0435><U+0434><U+0438><U+043A><U+0430><U+043C><U+0435><U+043D><U+0442><U+043E><U+0432>"
[2] "<U+0414><U+0430><U+043B><U+0435><U+043A><U+043E> <U+0435><U+0445><U+0430><U+0442><U+044C>"
[3] "<U+041E><U+043F><U+0430><U+0441><U+043D><U+043E><U+0441><U+0442><U+044C> <U+043D><U+0430> <U+0434><U+043E><U+0440><U+043E><U+0433><U+0430><U+0445> <U+043A> <U+043C><U+0435><U+0434>.<U+0443><U+0447><U+0435><U+0440><U+0435><U+0436><U+0434><U+0435><U+043D><U+0438><U+044E>" [4] "Only one family physician"
The Sys.getlocale()
is
"LC_COLLATE=Russian_Russia.1251;LC_CTYPE=Russian_Russia.1251;LC_MONETARY=Russian_Russia.1251;LC_NUMERIC=C;LC_TIME=Russian_Russia.1251"
Where could be the origin of the encoding issue? Is there any way to correctly render the .docx file with the correct characters?
Не хватка медикаментов Далеко ехать Опасность на дорогах к мед.учереждению Only one family physician
I have also tried with Sys.setlocale("LC_CTYPE", "English")
. The .docx template is set to "UTF-8". The rmarkdown is also set to options(encoding = "UTF-8")
.
Upvotes: 0
Views: 212
Reputation: 3677
You should use enc2utf8
in this situation:
```{r data}
df <- data.frame(x =
c("Нехватка медикаментов",
"Далеко ехать",
"Опасность на дорогах к мед.учреждению",
"Only one family physician")
)
```
```{r}
enc2utf8(df$x)
```
Upvotes: 0