Conditional Unique Counting in R data.table

Question

I would like to count the number of conflicts in my dataset by group. I feel like there has to be an easy way to do this in data.table, but can't seem to figure it out. I've created a dummy variable to tell me if there is a conflict for each row of the data.table:

testDT <- data.table(Name = c(rep('A',6),rep('B',5)),
                     Division = c(rep(11,6),rep(12,5)),
                     ID = c(205,205,NA,201,201,201,203,203,203,204,NA),
                     Conflict = c(0,0,0,1,1,1,1,1,1,1,0))

I need to count the unique number of non-NA IDs that have a conflict flag of 1 and apply that count in a new column to each Name-Division grouping. This is what the answer should be:

testDT[, Count := c(rep(1,6),rep(2,5))]

    Name Division  ID Conflict Count
 1:    A       11 205        0     1
 2:    A       11 205        0     1
 3:    A       11  NA        0     1
 4:    A       11 201        1     1
 5:    A       11 201        1     1
 6:    A       11 201        1     1
 7:    B       12 203        1     2
 8:    B       12 203        1     2
 9:    B       12 203        1     2
10:    B       12 204        1     2
11:    B       12  NA        0     2

I've been thinking about some usage of sum(!is.na(unique(ID))), but I'm not sure how to conditionally count the unique values without creating criteria in the i section of the data.table (Conflict == 1).

akuiper · Accepted Answer

You can subset the ID variable by conditions within the data.table [] and then count the unique values:

library(data.table)
testDT[, Count := uniqueN(ID[!is.na(ID) & Conflict == 1]), by=.(Name, Division)]
testDT
#     Name Division  ID Conflict Count
#  1:    A       11 205        0     1
#  2:    A       11 205        0     1
#  3:    A       11  NA        0     1
#  4:    A       11 201        1     1
#  5:    A       11 201        1     1
#  6:    A       11 201        1     1
#  7:    B       12 203        1     2
#  8:    B       12 203        1     2
#  9:    B       12 203        1     2
# 10:    B       12 204        1     2
# 11:    B       12  NA        0     2

Or following your logic:

testDT[, Count := sum(!is.na(unique(ID[Conflict == 1]))), by=.(Name, Division)]

Conditional Unique Counting in R data.table

Answers (2)

Related Questions