Amer
Amer

Reputation: 2251

Warning message: In eval(jsub, SDenv, parent.frame()) : NAs introduced by coercion

The WBC column is in character struture. I want to convert it to numeric. Although seems simple, I am not sure why i am getting this warning error message: Warning message:In eval(jsub, SDenv, parent.frame()) : NAs introduced by coercion. All failed to convert to numeric. I dont see anything abnormal in the numbers in the WBC column. Any idea?

I am using R and Rstudio

df <- structure(list(IDNUM = c(26L, 48L, 103L, 104L, 106L, 108L, 110L, 
                            113L, 115L, 116L), WBC = c(NA, NA, " 15.72", "10.53 ", NA, 
                                                       "7.07 ", "7.27 ", " 5.33 ", "7.26 ", "8.24 ")), row.names = c(NA, 
                                                                                                                     10L), class = "data.frame")


library(data.table)
 df <- as.data.table(df)
df[, WBC2 := as.numeric(paste(WBC))]

Warning message:
In eval(jsub, SDenv, parent.frame()) : NAs introduced by coercion

Upvotes: 0

Views: 701

Answers (1)

Giulio Centorame
Giulio Centorame

Reputation: 700

This is due to the way paste() has coerced the NAs. NAs have different representations in R according to the type they belong to, but they can also be interpreted as strings.

In your original data, the variable WBC is a character

> df <- structure(list(IDNUM = c(26L, 48L, 103L, 104L, 106L, 108L, 110L, 
                            113L, 115L, 116L), WBC = c(NA, NA, " 15.72", "10.53 ", NA, 
                                                       "7.07 ", "7.27 ", " 5.33 ", "7.26 ", "8.24 ")), row.names = c(NA, 
                                                                                                                     10L), class = "data.frame")
> str(df)
'data.frame':   10 obs. of  2 variables:
 $ IDNUM: int  26 48 103 104 106 108 110 113 115 116
 $ WBC  : chr  NA NA " 15.72" "10.53 " ...

The NAs in that case are NA_character_

> is.character(df[1,2])
[1] TRUE
> dput(df[1,2])
NA_character_

By using paste, R coerces that NA_character_ to "NA" (as in: a string containing "NA")

> paste(df[1,2])
[1] "NA"
> dput(paste(df[1,2]))
"NA"

Since you are re-coercing it to a numeric value with as.numeric(), R warns you that it cannot interpret it in any way as a number, so it prints a warning:

> as.numeric(paste(df[1,2]))
[1] NA
Warning message:
NAs introduced by coercion 
> dput(as.numeric(paste(df[1,2])))
NA_real_
Warning message:
In dput(as.numeric(paste(df[1, 2]))) : NAs introduced by coercion

NA_real_ is the way R uses to represent a double NA. So, technically, R is just warning you that it's coercing those strings to a double, but it's also generating NAs in the process as it doesn't assume (by design) how to interpret the string "NA".

In the example, you can get rid of the warning by just removing the paste() from your script:

>  df <- as.data.table(df)
df[, WBC2 := as.numeric(WBC)]
> str(df)
Classes ‘data.table’ and 'data.frame':  10 obs. of  3 variables:
 $ IDNUM: int  26 48 103 104 106 108 110 113 115 116
 $ WBC  : chr  NA NA " 15.72" "10.53 " ...
 $ WBC2 : num  NA NA 15.7 10.5 NA ...
 - attr(*, ".internal.selfref")=<externalptr> 

Upvotes: 1

Related Questions