Reputation: 11120
I have a doc.Rnw
supposed to produce some Russian UTF-8 strings:
\documentclass{article}
\usepackage{inputenc}
\inputencoding{utf8}
\usepackage[main=english,russian]{babel}
\begin{document}
\selectlanguage {russian}
<<test, results='asis', echo=FALSE>>=
print(readLines('string.rus', encoding="UTF-8"))
print("Здравствуйте")
@
Здравствуйте
\selectlanguage {english}
\end{document}
string.rus
has a UTF-8 string which corrrctly shows in R console:
print(readLines('string.rus', encoding="UTF-8"))
# [1] "Здравствуйте"
doc.Rnw
coorectly shows in Windows notepad, while both:
file.show("doc.Rnw")
file.show("doc.Rnw", encoding="UTF-8")
fail to show properly the UTF-8 strings.
Using:
knit("doc.Rnw")
The document part of the output doc.tex
shows:
\begin{document}
\selectlanguage {russian}
[1] "<U+0417><U+0434><U+0440><U+0430><U+0432><U+0441><U+0442><U+0432><U+0443><U+0439><U+0442><U+0435>"
[1] " <U+0097>д <U+0080>авс <U+0082>в <U+0083>й <U+0082>е"
Здравствуйте
\selectlanguage {english}
\end{document}
which of course does not compile in PDFLaTeX. Using:
knit("doc.Rnw", encoding="UTF-8")
gives even worse results.
Commenting the chunks which should generate UTF-8 strings:
print(readLines('string.rus', encoding="UTF-8"))
print("Здравствуйте")
gives a valid doc.tex
which compiles in MikTeX and shows properly the remaining UTF-8 string.
Even if I comment the first print...
and leave only the second one. I can't compile. This seems to prove that the original encoding of doc.Rnw
is correct.
I tried to replace both print
commands with:
a="Здравствуйте"
Encoding(a)="UTF-8"
print(a)
In this case I can compile, but the PDF output is (first string is cut out from margin):
[1] «U+0417><U+0434><U+0440><U+0430><U+0432><U+0441><U+0442><U+0432><U+0443>
Здравствуйте
So the chunk output is still wrong.
How to properly print UTF-8 strings from chunks?
R version is 3.3.3 (2017-03-06) for Windows and knitr is 1.15.1 (2016-11-22).
Upvotes: 1
Views: 610
Reputation: 11120
An extended working example is below:
\documentclass{article}
\usepackage{inputenc}
\inputencoding{utf8}
\usepackage[main=english,russian]{babel}
\begin{document}
\selectlanguage {russian}
<<test, results='asis', echo=FALSE>>=
s=readLines('string.rus', , encoding="UTF-8")
message("s ", Encoding(s), ": ", s)
Encoding(s)="latin1"
message("s latin1: ", s)
Encoding(s)="unkwnown"
message("s unkwnown: ", s)
Encoding(s)="utf8"
message("s utf8: ", a)
a="Здравствуйте"
message("a ", Encoding(a), ": ", a)
Encoding(a)="latin1"
message("a latin1: ", a)
Encoding(a)="utf8"
message("a utf8: ", a)
Encoding(a)="UTF-8"
message("a UTF-8: ", a)
u=("\U0417")
message("u ", Encoding(u), ": ", u)
Encoding(u)="latin1"
message("u latin1: ", u)
Encoding(u)="unkwnown"
message("u unkwnown: ", u)
@
Здравствуйте
\selectlanguage {english}
\end{document}
After knit("doc.Rnw"
, this is the output related to test
chunk found in doc.tex
(without knitr code decoration for readability):
s UTF-8: <U+0417><U+0434><U+0440><U+0430><U+0432><U+0441><U+0442><U+0432><U+0443><U+0439><U+0442><U+0435>
s latin1: Здравствуйте
s unkwnown: Здравствуйте
s utf8: <U+0417><U+0434><U+0440><U+0430><U+0432><U+0441><U+0442><U+0432><U+0443><U+0439><U+0442><U+0435>
a unknown: Здравствуйте
a latin1: Здравствуйте
a utf8: Здравствуйте
a UTF-8: <U+0417><U+0434><U+0440><U+0430><U+0432><U+0441><U+0442><U+0432><U+0443><U+0439><U+0442><U+0435>
u UTF-8: <U+0417>
u latin1: З
u unkwnown: З
Some comments follow.
First, only message()
works, print()
gives always errors.
In both the externally read string s
and the locally set a
, the behavior is weird.
in fact, keeping or explicitly setting the code to UTF-8
produces the wrong results (utf8
works for a
).
One might think the UTF8 encoding of the documents (doc.Rnw
and string.rus
) is not properly set. This is why I added the line u=("\U0417")
, which is UTF8 for sure. Again, only removing the UTF8 encoding gives a proper output.
In a simialr fashion, requesting explicitly an UTF8 output:
knit("doc.Rnw", encoding="UTF-8")
does not produce the UTF8 charaters, but their unicode values or weird ones.
In the end, I can produce the desired .tex
file and compile the LaTeX it, but why there is the above counter-intuitive behavior is beyond me.
Hopefully someone will give a good explanation.
Upvotes: 1