Reader 123
Reader 123

Reputation: 367

ASCII characters in R script - text output converts them to other characters despite encoding UTF-8

Using Windows 10, R 4.0.3 and RStudio 1.4.1103

I have a script (written by a developer), the output of which is a kind of a tree diagram in txt. The piece of code is:

for (index in seq_len(nrow(file))) {
write(paste0(path, if (index == nrow(file)) '└' else '├' , '──', file[index, 'name']), tree_filename, append = TRUE)

newPath = if (index == nrow(file)) paste0(path, '    ') else paste0(path, '│   ')
treefunction(file[index, id_column_header], newPath)}

The characters │ and └ appear correctly in RStudio when typed into the code. However, when the output of the function is saved in .txt, these characters become +'s and -'s for me, while for the developer all works perfectly (pls see image below with both outputs).enter image description here

What I have tried so far: I have set utf-8 in .RProfile and the .txt file is encoded in utf-8 (I have checked).

The developer is using linux (I'm not sure which version). Could someone please help with what I should do so the └ type characters display as they should? Thank you very much.

Upvotes: 1

Views: 1340

Answers (1)

David J. Bosak
David J. Bosak

Reputation: 1624

First of all, I sympathize with you. Encoding on Windows is a nightmare. There is a guy named Tomas Kalibera on the R Core team who is working to fix this. Probably in the next year or so it will be greatly improved. Here is a link that explains how he is going to fix it.

Second, I think you can solve your problem now by make a few changes to the way you are writing the strings:

  1. Use Unicode character codes instead of direct strings. These codes are known as "box drawing codes". A complete list and further information can be found here.

  2. Open your file with encoding = 'native.enc'

  3. Use writeLines instead of write with the useBytes = TRUE option.

Here is an example:

f <- file("test.txt", open = "w", encoding = "native.enc")
writeLines("\U251C\U2500\U2500 Herr Dvorek Frank von Lakatos", f, useBytes = TRUE)
writeLines("\U2502   \U2514\U2500\U2500 Dr Maria Lakatos", f, useBytes = TRUE)
close(f)

The result in Notepad++ looks like this:

Box codes rendered in Notepad++

I'm working in same environment as you. So I think this should work.

If you need to read the file back in, use this:

mylines <- readLines("test.txt", encoding = "UTF-8")

Upvotes: 1

Related Questions