Reputation: 721
I am trying to grab some statistics from the fifa.com by using XML package. The import is successful but the column names have unicode symbols. I want to remove those symbols.
This is how I have got the data,
library(XML)
url <- "http://www.fifa.com/worldcup/statistics/teams/disciplinary.html"
foulbycountry <- readHTMLTable(url)
foulbycountry1 <- do.call(rbind.data.frame, foulbycountry)
The variable names include two characters that I want to remove. I have tried to create a new object but it is not working. For example,
country <- foulbycountry1$Teams▴▾
fouls.committed <- foulbycountry1$Fouls Committed▴▾
which gives me the following output,
> country <- foulbycountry1$Teams▴▾
Error: unexpected input in "country <- foulbycountry1$Teams�"
> fouls.committed <- foulbycountry1$Fouls Committed▴▾
Error: unexpected symbol in "fouls.committed <- foulbycountry1$Fouls Committed"
Is there any way you can suggest so that I can remove those extra unicode characters?
Upvotes: 1
Views: 2078
Reputation: 44614
iconv
is one option ...
names(foulbycountry1) <- iconv(names(foulbycountry1), to='ASCII', sub='')
names(foulbycountry1)
# [1] "Teams" "Teams" "Matches Played"
# [4] "Yellow Card" "Second yellow card and red card" "Red Cards"
# [7] "Fouls Committed" "Fouls Suffered\r\n" "Fouls causing a penalty"
This will remove any non-ASCII characters. One of the columns has linebreaks at the end of it. To remove these, too, you can use
gsub('\r|\n', '', iconv(names(foulbycountry1), to='ASCII', sub=''))
Upvotes: 2