Reputation: 5193
I know this question has been asked many times (Converting Character to Numeric without NA Coercion in R, Converting Character\Factor to Numeric without NA Coercion in R, etc.) but I cannot seem to figure out what is going on in this one particular case (Warning message: NAs introduced by coercion). Here is some reproducible data I'm working with.
#dependencies
library(rvest)
library(dplyr)
library(pipeR)
library(stringr)
library(translateR)
#scrape data from website
url <- "http://irandataportal.syr.edu/election-data"
ir.pres2014 <- url %>%
read_html() %>%
html_nodes(xpath='//*[@id="content"]/div[16]/table') %>%
html_table(fill = TRUE)
ir.pres2014<-ir.pres2014[[1]]
colnames(ir.pres2014)<-c("province","Rouhani","Velayati","Jalili","Ghalibaf","Rezai","Gharazi")
ir.pres2014<-ir.pres2014[-1,]
#Get rid of unnecessary rows
ir.pres2014<-ir.pres2014 %>%
subset(province!="Votes Per Candidate") %>%
subset(province!="Total Votes")
#Get rid of commas
clean_numbers = function (x) str_replace_all(x, '[, ]', '')
ir.pres2014 = ir.pres2014 %>% mutate_each(funs(clean_numbers), -province)
#remove any possible whitespace in string
no_space = function (x) gsub(" ","", x)
ir.pres2014 = ir.pres2014 %>% mutate_each(funs(no_space), -province)
This is where things start going wrong for me. I tried each of the following lines of code but I got all NA's each time. For example, I begin by trying to convert the second column (Rouhani) to numeric:
#First check class of vector
class(ir.pres2014$Rouhani)
#convert character to numeric
ir.pres2014$Rouhani.num<-as.numeric(ir.pres2014$Rouhani)
Above returns a vector of all NA's. I also tried:
as.numeric.factor <- function(x) {seq_along(levels(x))[x]}
ir.pres2014$Rouhani2<-as.numeric.factor(ir.pres2014$Rouhani)
And:
ir.pres2014$Rouhani2<-as.numeric(levels(ir.pres2014$Rouhani))[ir.pres2014$Rouhani]
And:
ir.pres2014$Rouhani2<-as.numeric(paste(ir.pres2014$Rouhani))
All those return NA's. I also tried the following:
ir.pres2014$Rouhani2<-as.numeric(as.factor(ir.pres2014$Rouhani))
That created a list of single digit numbers so it was clearly not converting the string in the way I have in mind. Any help is much appreciated.
Upvotes: 3
Views: 7503
Reputation: 545588
The reason is what looks like a leading space before the numbers:
> ir.pres2014$Rouhani
[1] " 1052345" " 885693" " 384751" " 1017516" " 519412" " 175608" …
Just remove that as well before the conversion. The situation is complicated by the fact that this character isn’t actually a space, it’s something else:
mystery_char = substr(ir.pres2014$Rouhani[1], 1, 1)
charToRaw(mystery_char)
# [1] c2 a0
I have no idea where it comes from but it needs to be replaced:
str_replace_all(x, rawToChar(as.raw(c(0xc2, 0xa0))), '')
Furthermore, you can simplify your code by applying the same transformation to all your columns at once:
mystery_char = rawToChar(as.raw(c(0xc2, 0xa0)))
to_replace = sprintf('[,%s]', mystery_char)
clean_numbers = function (x) as.numeric(str_replace_all(x, to_replace, ''))
ir.pres2014 = ir.pres2014 %>% mutate_each(funs(clean_numbers), -province)
Upvotes: 4