Reputation: 6191
I'd like to use data.table
but would like skip the calculation of the j
part if the by
corresponds to missing (NA
):
Here is an example data.table
library(data.table)
DT <- data.table(y=10, g=c(1,1,1,2,2,2,2,2,NA,NA))
It looks like this
> DT
y g
1: 10 1
2: 10 1
3: 10 1
4: 10 2
5: 10 2
6: 10 2
7: 10 2
8: 10 2
9: 10 NA
10: 10 NA
Now I'd like to do the by=
on g
and the two rows 9 and 10 will be lumped together because they have the same value NA
.
> DT[,.N, by=g]
g N
1: 1 3
2: 2 5
3: NA 2
I'd like to keep the NA
line in the output but would want to skip the calculate part in the result, ie., get the output, where N
is empty when g
is NA
> DT[,.N, by=g]
g N
1: 1 3
2: 2 5
3: NA NA
I thought I could access the value of g
through .GRP
but that only gives the group index and not the value. Is it possible to make the calculation conditional on the missing status of the by
variable?
Upvotes: 6
Views: 1485
Reputation: 42582
You may try this one:
DT[, .N * NA^is.na(g), by = g]
g V1 1: 1 3 2: 2 5 3: NA NA
It is an algebraic version of Henrik's if ... else ...
clause.
It uses the fact that NA^0
returns 1
while NA^1
returns NA
and that FALSE
and TRUE
can be coerced to 0
and 1
, resp.
If you want to control the column name:
DT[, .(n = .N * NA^is.na(g)), by = g]
g n 1: 1 3 2: 2 5 3: NA NA
Alternatively, if above appears to tricky you can resort to data.table chaining (thanks to Sotos for bringing this up):
DT[, .N, by = g][is.na(g), N := NA][]
This will change the value of N
after aggregation.
Upvotes: 7