Apollo
Apollo

Reputation: 9054

What does x[is.na(x)] do in R?

I'm following the swirl tutorial, and one of the parts has a vector x defined as:

> x
 [1]  1.91177824  0.93941777 -0.72325856  0.26998371          NA          NA
 [7] -0.17709161          NA          NA  1.98079386 -1.97167684 -0.32590760
[13]  0.23359408 -0.19229380          NA          NA  1.21102697          NA
[19]  0.78323515          NA  0.07512655          NA  0.39457671  0.64705874
[25]          NA  0.70421548 -0.59875008          NA  1.75842059          NA
[31]          NA          NA          NA          NA          NA          NA
[37] -0.74265585          NA -0.57353603          NA

Then when we type x[is.na(x)] we get a vector of all NA's

> x[is.na(x)]
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Why does this happen? My confusion is that is.na(x) itself returns a vector of length 40 with True or False in each entry of the vector depending on whether that entry is NA or not. Why does "wrapping" this vector with x[ ] suddenly subset to the NA's themselves?

Upvotes: 0

Views: 8356

Answers (1)

smci
smci

Reputation: 33950

This is called logical indexing. It's a very common and neat R idiom.

Yes, is.na(x) gives a boolean ("logical") vector of same length as your vector.

Using that logical vector for indexing is called logical indexing.

Obviously x[is.na(x)] accesses the vector of all NA entries in x, and is totally pointless unless you intend to reassign them to some other value, e.g. impute the median (or anything else)

 x[is.na(x)] <- median(x, na.rm=T)

Notes:

  • whereas x[!is.na(x)] accesses all non-NA entries in x
  • or compare also to the na.omit(x) function, which is way more clunky
  • The way R's builtin functions historically do (or don't) handle NAs (by default or customizably) is a patchwork-quilt mess, that's why the x[is.na(x)] idiom is so crucial)
  • many useful functions (mean, median, sum, sd, cor) are NA-aware, i.e. they support an na.rm=TRUE option to ignore NA values. See here. Also for how to define table_, mode_, clamp_

Upvotes: 3

Related Questions