Reputation: 71
I'm thinking that it should be possible to do the following with non-equi joins but can't manage to make it work. Sort of an extension to this question I asked a few weeks back: Fast way to find min in groups after excluding observations using R.
I have a data set of applications. If score is above cutoff you are admitted. Now I want to identify which applications are strictly dominated. I.e. when someone has prioritized a choice lower than another choice with a lower margin and will thus never be admitted to that option.
I.e. when comparing cutoff
at a specific row with all rows in the same group that have a lower prio number, set dominated = TRUE if there is a higher prioritized (lower prio) choice with a lower cutoff.
The following code works but is pretty darn slow:
library(data.table)
dt <- data.table(prio = c(c(1,2,4,5,6,7,8), c(1,2,4,5), c(1,2,4,5,6,7,8), c(1,2,4,10,13)),
c = c(c(20,16,19,20,21,11,22), c(1.5, 1.3, 1.7, 1.2), c(20,16,19,20,21,11,22), c(123,332,121,334,335)),
admission_group = c(rep("X", 7), rep("Y", 4), rep("X", 7), rep("Z", 5)),
individual = c(rep("A", 11), rep("B", 12)),
dominated = rep(FALSE, 23))
dt[,
min_c_lower_prio :=
unname(sapply(split(outer(prio,prio, "<="),
rep(1:length(prio),
each = length(prio))),
FUN = function(x) min(c[x], na.rm = TRUE))),
by = .(admission_group, individual)
]
dt[c > min_c_lower_prio, dominated := TRUE]
Upvotes: 1
Views: 59
Reputation: 66819
Yes, it can be done with non-equi joins:
dt[, d := dt[.SD, on=.(admission_group, individual, prio < prio, c < c), mult="first",
.N > 0, by=.EACHI]$V1]
Alternately, sort by priority and use cummin
:
dt[order(prio), d2 := c > cummin(c), by=.(admission_group, individual)]
Upvotes: 3