Reputation: 3515
I am doing some web scraping of names into a dataframe
For a name such as "Tomáš Rosický, I get a result "Tomáš Rosický"
I tried
Encoding("Tomáš Rosický") # with latin1 response
but was not sure where to go from there to get the original name with accents back. Played around with iconv without success
I would be satisfied (and might even prefer) an output of "Tomas Rosicky"
Upvotes: 15
Views: 53651
Reputation: 1586
To do a correct read of the file use the scan function:
namb <- scan(file='g:/testcodering.txt', fileEncoding='UTF-8',
what=character(), sep='\n', allowEscapes=T)
cat(namb)
This also works:
namc <- readLines(con <- file('g:/testcodering.txt', "r",
encoding='UTF-8')); close(con)
cat(namc)
This will read the file with the correct accents
Upvotes: 5
Reputation: 71
You should use this:
df$colname <- iconv(df$colname, from="UTF-8", to="LATIN1")
Upvotes: 7
Reputation: 31
A way to export accents correctly:
enc2utf8(as(dataframe$columnname, "character"))
Upvotes: 3
Reputation: 57696
You've read in a page encoded in UTF-8. if x
is your column of names, use Encoding(x) <- "UTF-8"
.
Upvotes: 13