Reputation: 125
I'm trying to transform a Markdown file to a .docx file with pandoc. Unfortunately it is bitterly and stubbornly complaining about its format not being "UTF-8":
When creating the Markdown file, I'm using text-data from an Excel file written in English. Two of the columns are coded in an "unknown" format according to "Encoding" as per How to identify/delete non-UTF-8 characters in R. Please see example vector for one of the columns (with data categories) below:
exampleVector
[1] "other wards" "organisation" "other wards" "Trystview" "break" "other wards"
[7] "Trystview" "other" "break" "other"
exampleVector %>% Encoding()
[1] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
exampleVector %>% dput()
c("other wards", "organisation", "other wards", "Trystview",
"break", "other wards", "Trystview", "other", "break", "other"
)
I've tried all suggestions on How to identify/delete non-UTF-8 characters in R and Force character vector encoding from "unknown" to "UTF-8" in R without success, including the commands in the "stringi" library to transform the above vector to "UTF-8" format. I'm not sure what I'm missing and am wondering why the format of a fairly mundane Excel file is posing such challenges for pandoc. I used read_excel from the "readxl" library to import Excel data. Would be grateful for any suggestions.
Upvotes: 1
Views: 1382
Reputation: 125
I found the answer to my frustrations! I only had to add the parameter encoding = "UTF-8"
to the lines defining the creation of the Markdown file in the R code:
fileConn <- file("C:/projects/use of time/report1.md", encoding = "UTF-8")
close(fileConn)
Upvotes: 1