pssguy
pssguy

Reputation: 3515

handling special characters e.g. accents in R

I am doing some web scraping of names into a dataframe

For a name such as "Tomáš Rosický, I get a result "Tomáš Rosický"

I tried

Encoding("Tomáš Rosický") #  with latin1 response

but was not sure where to go from there to get the original name with accents back. Played around with iconv without success

I would be satisfied (and might even prefer) an output of "Tomas Rosicky"

Upvotes: 15

Views: 53651

Answers (4)

Mischa Vreeburg
Mischa Vreeburg

Reputation: 1586

To do a correct read of the file use the scan function:

namb <- scan(file='g:/testcodering.txt', fileEncoding='UTF-8',
what=character(), sep='\n', allowEscapes=T)
cat(namb)

This also works:

namc <- readLines(con <- file('g:/testcodering.txt', "r",
encoding='UTF-8')); close(con)
cat(namc)

This will read the file with the correct accents

Upvotes: 5

Roadkill
Roadkill

Reputation: 71

You should use this:

df$colname <- iconv(df$colname, from="UTF-8", to="LATIN1")

Upvotes: 7

iulilia
iulilia

Reputation: 31

A way to export accents correctly:

enc2utf8(as(dataframe$columnname, "character"))

Upvotes: 3

Hong Ooi
Hong Ooi

Reputation: 57696

You've read in a page encoded in UTF-8. if x is your column of names, use Encoding(x) <- "UTF-8".

Upvotes: 13

Related Questions