Pake
Pake

Reputation: 1118

Not Equal To in Relational Count using data.table

I am hoping to understand how to use data.table to calculate a count given all levels of a particular categorical variable that do not match the value for the record.

Take the follow data.table.

df <- data.table(var1 = c('dog','cat','dog','cat','dog','dog','dog'),
             var2 = c(1,5,90,95,91,110,8),
             var3 = c('lamp','lamp','lamp','table','table','table','table'))

I would like to calculate the count of values that fall within a range and do not share the same value for var1.

This is related to Count of values within specified range of value in each row using data.table. To quote the answer from @Jaap, the following code allows for producing a count within a range.

df[, var2withinrange := df[.(var2min = var2 - 5, var2plus = var2 + 5)
                       , on = .(var2 >= var2min, var2 <= var2plus)
                       , .N
                       , by = .EACHI][, N]][]

In attempting to expand this answer, I had success in requiring an exact match for var1 with the following:

df[, var2withinrange := df[.(var2min = var2 - 5, var2plus = var2 + 5, var1 = var1)
                       , on = .(var2 >= var2min, var2 <= var2plus, var1 = var1)
                       , .N
                       , by = .EACHI][, N]][]

The code below is my attempt at counting when var1 holds a value that is not equal to the var1 value in the given row, but this code fails.

df[, var2withinrange := df[.(var2min = var2 - 5, var2plus = var2 + 5, var1 = var1)
                       , on = .(var2 >= var2min, var2 <= var2plus, var1 != var1)
                       , .N
                       , by = .EACHI][, N]][]

How can a "not equal to" type operator be added? A data.table answer is preferable, but of course a solution in dplyr or really any alternative would be appreciated!

Upvotes: 3

Views: 335

Answers (1)

chinsoon12
chinsoon12

Reputation: 25225

In this particular case, you can do the following:

df[.(var2min = var2 - 5, var2plus = var2 + 5, v1=var1)
    , on = .(var2 >= var2min, var2 <= var2plus)
    , sum(v1 != x.var1)
    , by = .EACHI]

output:

   var2 var2 V1
1:   -4    6  1
2:    0   10  2
3:   85   95  1
4:   90  100  2
5:   86   96  1
6:  105  115  0
7:    3   13  1

In general, I think you can do an anti-join.

Upvotes: 1

Related Questions