Carmellose
Carmellose

Reputation: 5088

R's tapply with null function

I'm having trouble understanding what tapply function does when the FUN argument is null.

The documentation says:

If FUN is NULL, tapply returns a vector which can be used to subscript the multi-way array tapply normally produces.

For example, what does the following example of the documentation do?

ind <- list(c(1, 2, 2), c("A", "A", "B"))
tapply(1:3, ind) #-> the split vector

I don't understand the results:

[1] 1 2 4

Thanks.

Upvotes: 5

Views: 407

Answers (1)

Iaroslav Domin
Iaroslav Domin

Reputation: 2718

If you run tapply with a specified function (not NULL), say sum, like in help, you'll see that the result is a 2-dimensional array with NA in one cell:

res <- tapply(1:3, ind, sum)
res
   A  B
 1 1 NA
 2 2  3

It means that one combination of factors, namely (1, B), is absent. When FUN is NULL, it returns a vector indices corresponding to all present factor combinations. To check this:

> which(!is.na(res))
[1] 1 2 4

One thing to mention, the specified function can return NA's itself, like in the following toy example:

> f <- function(x){
      if(x[[1]] == 1) return(NA)
      return(sum(x))
  }
> tapply(1:3, ind, f)
   A  B
1 NA NA
2  2  3

So, in general, NA doesn't mean that a factor combination is absent.

Upvotes: 3

Related Questions