Reputation: 2251
The WBC
column is in character struture. I want to convert it to numeric. Although seems simple, I am not sure why i am getting this warning error message: Warning message:In eval(jsub, SDenv, parent.frame()) : NAs introduced by coercion.
All failed to convert to numeric. I dont see anything abnormal in the numbers in the WBC
column. Any idea?
I am using R and Rstudio
df <- structure(list(IDNUM = c(26L, 48L, 103L, 104L, 106L, 108L, 110L,
113L, 115L, 116L), WBC = c(NA, NA, " 15.72", "10.53 ", NA,
"7.07 ", "7.27 ", " 5.33 ", "7.26 ", "8.24 ")), row.names = c(NA,
10L), class = "data.frame")
library(data.table)
df <- as.data.table(df)
df[, WBC2 := as.numeric(paste(WBC))]
Warning message:
In eval(jsub, SDenv, parent.frame()) : NAs introduced by coercion
Upvotes: 0
Views: 701
Reputation: 700
This is due to the way paste()
has coerced the NAs. NAs have different representations in R according to the type they belong to, but they can also be interpreted as strings.
In your original data, the variable WBC
is a character
> df <- structure(list(IDNUM = c(26L, 48L, 103L, 104L, 106L, 108L, 110L,
113L, 115L, 116L), WBC = c(NA, NA, " 15.72", "10.53 ", NA,
"7.07 ", "7.27 ", " 5.33 ", "7.26 ", "8.24 ")), row.names = c(NA,
10L), class = "data.frame")
> str(df)
'data.frame': 10 obs. of 2 variables:
$ IDNUM: int 26 48 103 104 106 108 110 113 115 116
$ WBC : chr NA NA " 15.72" "10.53 " ...
The NAs in that case are NA_character_
> is.character(df[1,2])
[1] TRUE
> dput(df[1,2])
NA_character_
By using paste, R coerces that NA_character_
to "NA"
(as in: a string containing "NA")
> paste(df[1,2])
[1] "NA"
> dput(paste(df[1,2]))
"NA"
Since you are re-coercing it to a numeric value with as.numeric()
, R warns you that it cannot interpret it in any way as a number, so it prints a warning:
> as.numeric(paste(df[1,2]))
[1] NA
Warning message:
NAs introduced by coercion
> dput(as.numeric(paste(df[1,2])))
NA_real_
Warning message:
In dput(as.numeric(paste(df[1, 2]))) : NAs introduced by coercion
NA_real_
is the way R uses to represent a double
NA. So, technically, R is just warning you that it's coercing those strings to a double
, but it's also generating NAs in the process as it doesn't assume (by design) how to interpret the string "NA"
.
In the example, you can get rid of the warning by just removing the paste()
from your script:
> df <- as.data.table(df)
df[, WBC2 := as.numeric(WBC)]
> str(df)
Classes ‘data.table’ and 'data.frame': 10 obs. of 3 variables:
$ IDNUM: int 26 48 103 104 106 108 110 113 115 116
$ WBC : chr NA NA " 15.72" "10.53 " ...
$ WBC2 : num NA NA 15.7 10.5 NA ...
- attr(*, ".internal.selfref")=<externalptr>
Upvotes: 1