Reputation: 21
I am a grad student using R and have been reading the other Stack Overflow answers regarding removing rows that contain NA from dataframes. I have tried both na.omit and complete.cases. When using both it shows that the rows with NA have been removed, but when I write summary(data.frame) it still includes the NAs. Are the rows with NA actually removed or am I doing this wrong?
na.omit(Perios)
summary(Perios)
Perios[complete.cases(Perios),]
summary(Perios)
Upvotes: 2
Views: 19453
Reputation: 33940
The error is that you actually didn't assign the output from na.omit
!
Perios <- na.omit(Perios)
If you know which column the NAs occur in, then you can just do
Perios[!is.na(Perios$Periostitis),]
or more generally:
Perios[!is.na(Perios$colA) & !is.na(Perios$colD) & ... ,]
Then as a general safety tip for R, throw in an na.fail to assert it worked:
na.fail(Perios) # trust, but verify! Die Paranoia ist gesund.
Upvotes: 2
Reputation: 42639
is.na
is not the proper function. You want complete.cases
and you want complete.cases
which is the equivalent of function(x) apply(is.na(x), 1, all)
or na.omit
to filter the data:
That is, you want all rows where there are no NA
values.
< x <- data.frame(a=c(1,2,NA), b=c(3,NA,NA))
> x
a b
1 1 3
2 2 NA
3 NA NA
> x[complete.cases(x),]
a b
1 1 3
> na.omit(x)
a b
1 1 3
Then this is assigned back to x
to save the data.
complete.cases
returns a vector, one element per row of the input data frame. On the other hand, is.na
returns a matrix. This is not appropriate for returning complete cases, but can return all non-NA values as a vector:
> is.na(x)
a b
[1,] FALSE FALSE
[2,] FALSE TRUE
[3,] TRUE TRUE
> x[!is.na(x)]
[1] 1 2 3
Upvotes: 1