Tal Galili
Tal Galili

Reputation: 25306

Convert a file encoding using R? (ANSI to UTF-8)

I wish to convert an HTML file encoded in ANSI to UTF-8, using R.

Is there a tool, or a combination of tools, that can make this work?

Thanks.

Edit: o.k, I've narrowed my problem to another one. It is re-posted here: Using "cat" to write non-English characters into a .html file (in R)

Upvotes: 10

Views: 19742

Answers (2)

kohske
kohske

Reputation: 66842

you can use iconv:

writeLines(iconv(readLines("tmp.html"), from = "ANSI_X3.4-1986", to = "UTF8"), "tmp2.html")

tmp2.html should be utf-8.


Edit by Henrik in June 2015:
A working solution for Windows distilled from the comments is as follows:

writeLines(iconv(readLines("tmp.html"), from = "ANSI_X3.4-1986", to = "UTF8"), 
           file("tmp2.html", encoding="UTF-8"))

Update 2021: And if ANSI is the current locale, the following works as well (i.e., uses the local encoding as from source):

writeLines(iconv(readLines("tmp.html"), from = "", to = "UTF8"), 
           file("tmp2.html", encoding="UTF-8"))

Upvotes: 23

ExaFusion
ExaFusion

Reputation: 11

I had some problems with the solutions proposed above, especially with the TAB character. This alternative never disappointed me. Unfortunately it only works on UNIX-like systems.

system('iconv -f CP1252 -t UTF-8 < tmp.html > tmp2.html')

Upvotes: 0

Related Questions