blazej
blazej

Reputation: 1788

R: Encoding of labelled data and knit to html problems

First of all, sorry for not providing a reproducible example and posting images, a word of explanation why I did it is at the end.

I'd really appreciate some help - comments or otherwise, I think I did my best to be as specific and concise as I can

Problem I'm trying to solve is how to set up (and where to do it) encoding in order to get polish letters after a .Rmd document is knitted to html.

I'm working with a labelled spss file imported to R via haven library and using sjPlot tools to make tables and graphs.

I already spent almost all day trying to sort this out, but I feel I'm stucked with no idea where to go.

My sessionInfo()

R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=Polish_Poland.1250  LC_CTYPE=Polish_Poland.1250    LC_MONETARY=Polish_Poland.1250
[4] LC_NUMERIC=C                   LC_TIME=Polish_Poland.1250    

Whenever I run (via console / script)

sjt.frq(df$sex, encoding = "Windows-1250")

I get a nice table with proper encoding in the rstudio viewer pane:

enter image description here

Trying with no encoding sjt.frq(df$sex) gives this: enter image description here

I could live with setting encoding each time a call to sjt.frq is made, but problem is, that no matter how I set up sjt.frq inside a markdown document, it always gets knited the wrong way.

Running chunk inside .Rmd is OK (for a completely unknown reason encoding = "UTF-8 worked as well here and it didn't previously):

enter image description here

Knitting same document, not OK: (note, that html header has all the polish characters) enter image description here

Also, it looks like that it could be either html or sjPlot specific because knitr can print polish letters when they are in a vector and are passed as if they where printed to console:

enter image description here

Is there anything I can set up / change in order to make this work?

While testing different options I discovered, that manually converting sex variable to factor and assigning labels again, works and Rstudio knits to html with proper encoding

df$sex <- factor(df$sex, label = c("kobieta", "mężczyzna"))
sjt.frq(df$sex, encoding = "Windows-1250")

Regarding no reproducible example:

I tried to simulate this example with fake data:

# Get libraries
library(sjPlot)
library(sjlabelled)

x <- rep(1:4, 4)
x<- set_labels(x, labels = c("ąę", "ćŁ", "óŚŚ", "abcd"))

# Run freq table similar to df$sex above
sjt.frq(x)
sjt.frq(x, encoding = "UTF-8")
sjt.frq(x, encoding = "Windows-1250")

Thing is, each sjt.frq call knits the way it should (although only encoding = "Windows-1250" renders properly in rstudio viewer pane.

Upvotes: 1

Views: 1167

Answers (1)

Daniel
Daniel

Reputation: 7832

If you run sjt.frq(), a complete HTML-page is returned, which is displayed in a viewer.

However, for use inside markdown/knitr-documents, there are only parts of the HTML-output required: You don't need the <head> part, for instance, as the knitr-document creates an own header for the HTML-page. Thus, there's an own print()-method for knitr-documents, which use another return-value to include into the knitr-file.

Compare:

dummy <- sjt.frq(df$sex, encoding = "Windows-1250")
dummy$output.complete # used for default display in viewer
dummy$knitr           # used in knitr-documents

Since the encoding is located in the <meta>-tag, which is not included in the $knitr-value, the encoding-argument in sjt.frq() has no effect on knitr-documents.

I think that this might help you: rmarkdown::render_site(encoding = 'UTF-8'). Maybe there are also other options to encode text, or you need to modify the final HTML-file, changing the charset encoding there.

Upvotes: 3

Related Questions