Skip NA in data.table by

Question

I'd like to use data.table but would like skip the calculation of the j part if the by corresponds to missing (NA):

Here is an example data.table

library(data.table)
DT <- data.table(y=10, g=c(1,1,1,2,2,2,2,2,NA,NA))

It looks like this

Now I'd like to do the by= on g and the two rows 9 and 10 will be lumped together because they have the same value NA.

> DT[,.N, by=g]
    g N
1:  1 3
2:  2 5
3: NA 2

I'd like to keep the NA line in the output but would want to skip the calculate part in the result, ie., get the output, where N is empty when g is NA

> DT[,.N, by=g]
    g N
1:  1 3
2:  2 5
3: NA NA

I thought I could access the value of g through .GRP but that only gives the group index and not the value. Is it possible to make the calculation conditional on the missing status of the by variable?

Uwe · Accepted Answer

You may try this one:

DT[, .N * NA^is.na(g), by = g]

    g V1
1:  1  3
2:  2  5
3: NA NA

It is an algebraic version of Henrik's if ... else ... clause. It uses the fact that NA^0 returns 1 while NA^1 returns NA and that FALSE and TRUE can be coerced to 0 and 1, resp.

If you want to control the column name:

DT[, .(n = .N * NA^is.na(g)), by = g]

    g  n
1:  1  3
2:  2  5
3: NA NA

Alternatively, if above appears to tricky you can resort to data.table chaining (thanks to Sotos for bringing this up):

DT[, .N, by = g][is.na(g), N := NA][]

This will change the value of N after aggregation.

Skip NA in data.table by

Answers (1)

Related Questions