Reputation: 9793
So I have a dataset and just by looking at it there are clear NA's in the dataset.
> dput(bmi.cig)
structure(list(MSI.subset.BMI = structure(c(4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 1L, 2L, 3L, 3L, 1L, 3L, 3L, 1L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("0", "1", "2",
"NA"), class = "factor"), MSI.subset.Cigarette = structure(c(3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 2L, 1L, 2L,
2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("1",
"2", "NA"), class = "factor")), .Names = c("MSI.subset.BMI",
"MSI.subset.Cigarette"), row.names = c(NA, 30L), class = "data.frame")
> head(bmi.cig)
MSI.subset.BMI MSI.subset.Cigarette
1 NA NA
2 NA NA
3 NA NA
4 NA NA
5 NA NA
6 NA NA
I want to remove any row that contains an NA in either column, so I'm using the list-wise deletion function ld
in the ForImp
package. However, R is not recognizing the NA values.
is.na(bmi.cig$MSI.subset.BMI)
I get
> is.na(bmi.cig$MSI.subset.BMI)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[26] FALSE FALSE FALSE FALSE FALSE
So once I use the ld
function I just get an empty dataset in return.
Upvotes: 3
Views: 19899
Reputation: 99331
As @rbatt mentions, you have character NA values as factor levels. You can remove them and get the NA entries to register as real NA
values for the entire data set with
df[] <- lapply(df, function(x) {
is.na(levels(x)) <- levels(x) == "NA"
x
})
where df
is your data set. And now test with
is.na(df)
Upvotes: 2
Reputation: 4807
It's b/c the columns are factors, and the levels are "NA"
. I.e., try
data <- structure(list(MSI.subset.BMI = structure(c(4L, 4L, 4L, 4L, 4L,
+ 4L, 4L, 4L, 4L, 4L, 4L, 1L, 2L, 3L, 3L, 1L, 3L, 3L, 1L, 4L, 4L,
+ 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("0", "1", "2",
+ "NA"), class = "factor"), MSI.subset.Cigarette = structure(c(3L,
+ 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 2L, 1L, 2L,
+ 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("1",
+ "2", "NA"), class = "factor")), .Names = c("MSI.subset.BMI",
+ "MSI.subset.Cigarette"), row.names = c(NA, 30L), class = "data.frame")
> class(blah[,1])
data[,1]=="NA"
The NA
's are actually characters (class("NA")
), not class logical
like class(NA)
.
Upvotes: 5