Reputation: 9054
I'm following the swirl tutorial, and one of the parts has a vector x defined as:
> x
[1] 1.91177824 0.93941777 -0.72325856 0.26998371 NA NA
[7] -0.17709161 NA NA 1.98079386 -1.97167684 -0.32590760
[13] 0.23359408 -0.19229380 NA NA 1.21102697 NA
[19] 0.78323515 NA 0.07512655 NA 0.39457671 0.64705874
[25] NA 0.70421548 -0.59875008 NA 1.75842059 NA
[31] NA NA NA NA NA NA
[37] -0.74265585 NA -0.57353603 NA
Then when we type x[is.na(x)]
we get a vector of all NA
's
> x[is.na(x)]
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Why does this happen? My confusion is that is.na(x)
itself returns a vector of length 40 with True
or False
in each entry of the vector depending on whether that entry is NA
or not. Why does "wrapping" this vector with x[ ] suddenly subset to the NA
's themselves?
Upvotes: 0
Views: 8356
Reputation: 33950
This is called logical indexing. It's a very common and neat R idiom.
Yes, is.na(x)
gives a boolean ("logical") vector of same length as your vector.
Using that logical vector for indexing is called logical indexing.
Obviously x[is.na(x)]
accesses the vector of all NA entries in x, and is totally pointless unless you intend to reassign them to some other value, e.g. impute the median (or anything else)
x[is.na(x)] <- median(x, na.rm=T)
Notes:
x[!is.na(x)]
accesses all non-NA entries in xna.omit(x)
function, which is way more clunkyx[is.na(x)]
idiom is so crucial)mean, median, sum, sd, cor
) are NA-aware, i.e. they support an na.rm=TRUE
option to ignore NA values. See here. Also for how to define table_, mode_, clamp_
Upvotes: 3