Reputation: 1118
I am hoping to understand how to use data.table to calculate a count given all levels of a particular categorical variable that do not match the value for the record.
Take the follow data.table.
df <- data.table(var1 = c('dog','cat','dog','cat','dog','dog','dog'),
var2 = c(1,5,90,95,91,110,8),
var3 = c('lamp','lamp','lamp','table','table','table','table'))
I would like to calculate the count of values that fall within a range and do not share the same value for var1
.
This is related to Count of values within specified range of value in each row using data.table. To quote the answer from @Jaap, the following code allows for producing a count within a range.
df[, var2withinrange := df[.(var2min = var2 - 5, var2plus = var2 + 5)
, on = .(var2 >= var2min, var2 <= var2plus)
, .N
, by = .EACHI][, N]][]
In attempting to expand this answer, I had success in requiring an exact match for var1
with the following:
df[, var2withinrange := df[.(var2min = var2 - 5, var2plus = var2 + 5, var1 = var1)
, on = .(var2 >= var2min, var2 <= var2plus, var1 = var1)
, .N
, by = .EACHI][, N]][]
The code below is my attempt at counting when var1
holds a value that is not equal to the var1
value in the given row, but this code fails.
df[, var2withinrange := df[.(var2min = var2 - 5, var2plus = var2 + 5, var1 = var1)
, on = .(var2 >= var2min, var2 <= var2plus, var1 != var1)
, .N
, by = .EACHI][, N]][]
How can a "not equal to" type operator be added? A data.table
answer is preferable, but of course a solution in dplyr
or really any alternative would be appreciated!
Upvotes: 3
Views: 335
Reputation: 25225
In this particular case, you can do the following:
df[.(var2min = var2 - 5, var2plus = var2 + 5, v1=var1)
, on = .(var2 >= var2min, var2 <= var2plus)
, sum(v1 != x.var1)
, by = .EACHI]
output:
var2 var2 V1
1: -4 6 1
2: 0 10 2
3: 85 95 1
4: 90 100 2
5: 86 96 1
6: 105 115 0
7: 3 13 1
In general, I think you can do an anti-join.
Upvotes: 1