R: is.na() does not pick up NA value

Question

So I have a dataset and just by looking at it there are clear NA's in the dataset.

 > dput(bmi.cig)
structure(list(MSI.subset.BMI = structure(c(4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 1L, 2L, 3L, 3L, 1L, 3L, 3L, 1L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("0", "1", "2", 
"NA"), class = "factor"), MSI.subset.Cigarette = structure(c(3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 2L, 1L, 2L, 
2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("1", 
"2", "NA"), class = "factor")), .Names = c("MSI.subset.BMI", 
"MSI.subset.Cigarette"), row.names = c(NA, 30L), class = "data.frame")


> head(bmi.cig)
  MSI.subset.BMI MSI.subset.Cigarette
1             NA                   NA
2             NA                   NA
3             NA                   NA
4             NA                   NA
5             NA                   NA
6             NA                   NA

I want to remove any row that contains an NA in either column, so I'm using the list-wise deletion function ld in the ForImp package. However, R is not recognizing the NA values.

is.na(bmi.cig$MSI.subset.BMI)

I get

    > is.na(bmi.cig$MSI.subset.BMI)
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[26] FALSE FALSE FALSE FALSE FALSE

So once I use the ld function I just get an empty dataset in return.

rbatt · Accepted Answer

It's b/c the columns are factors, and the levels are "NA". I.e., try

data <- structure(list(MSI.subset.BMI = structure(c(4L, 4L, 4L, 4L, 4L, 
+ 4L, 4L, 4L, 4L, 4L, 4L, 1L, 2L, 3L, 3L, 1L, 3L, 3L, 1L, 4L, 4L, 
+ 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("0", "1", "2", 
+ "NA"), class = "factor"), MSI.subset.Cigarette = structure(c(3L, 
+ 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 2L, 1L, 2L, 
+ 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("1", 
+ "2", "NA"), class = "factor")), .Names = c("MSI.subset.BMI", 
+ "MSI.subset.Cigarette"), row.names = c(NA, 30L), class = "data.frame")
> class(blah[,1])

data[,1]=="NA"

The NA's are actually characters (class("NA")), not class logical like class(NA).

R: is.na() does not pick up NA value

Answers (2)

Related Questions