Yozef
Yozef

Reputation: 125

UTF-8 formatting problems in R

I'm trying to transform a Markdown file to a .docx file with pandoc. Unfortunately it is bitterly and stubbornly complaining about its format not being "UTF-8":

enter image description here

When creating the Markdown file, I'm using text-data from an Excel file written in English. Two of the columns are coded in an "unknown" format according to "Encoding" as per How to identify/delete non-UTF-8 characters in R. Please see example vector for one of the columns (with data categories) below:

exampleVector
 [1] "other wards"  "organisation" "other wards"  "Trystview"    "break"        "other wards" 
 [7] "Trystview"    "other"        "break"        "other"  

exampleVector %>% Encoding()
 [1] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"

exampleVector %>% dput()
c("other wards", "organisation", "other wards", "Trystview", 
"break", "other wards", "Trystview", "other", "break", "other"
)

I've tried all suggestions on How to identify/delete non-UTF-8 characters in R and Force character vector encoding from "unknown" to "UTF-8" in R without success, including the commands in the "stringi" library to transform the above vector to "UTF-8" format. I'm not sure what I'm missing and am wondering why the format of a fairly mundane Excel file is posing such challenges for pandoc. I used read_excel from the "readxl" library to import Excel data. Would be grateful for any suggestions.

Upvotes: 1

Views: 1382

Answers (1)

Yozef
Yozef

Reputation: 125

I found the answer to my frustrations! I only had to add the parameter encoding = "UTF-8" to the lines defining the creation of the Markdown file in the R code:

fileConn <- file("C:/projects/use of time/report1.md", encoding = "UTF-8")
close(fileConn)

Upvotes: 1

Related Questions