adamski
adamski

Reputation: 71

Using data table to find min in group for rows that fulfil row-specific criteria

I'm thinking that it should be possible to do the following with non-equi joins but can't manage to make it work. Sort of an extension to this question I asked a few weeks back: Fast way to find min in groups after excluding observations using R.

I have a data set of applications. If score is above cutoff you are admitted. Now I want to identify which applications are strictly dominated. I.e. when someone has prioritized a choice lower than another choice with a lower margin and will thus never be admitted to that option.

I.e. when comparing cutoff at a specific row with all rows in the same group that have a lower prio number, set dominated = TRUE if there is a higher prioritized (lower prio) choice with a lower cutoff.

The following code works but is pretty darn slow:

library(data.table)
dt <- data.table(prio = c(c(1,2,4,5,6,7,8), c(1,2,4,5), c(1,2,4,5,6,7,8), c(1,2,4,10,13)),
                 c = c(c(20,16,19,20,21,11,22), c(1.5, 1.3, 1.7, 1.2), c(20,16,19,20,21,11,22), c(123,332,121,334,335)),
                 admission_group = c(rep("X", 7), rep("Y", 4), rep("X", 7), rep("Z", 5)),
                 individual = c(rep("A", 11), rep("B", 12)),
                 dominated = rep(FALSE, 23))

dt[,
    min_c_lower_prio :=
        unname(sapply(split(outer(prio,prio, "<="),
                            rep(1:length(prio),
                                each = length(prio))),
                      FUN = function(x) min(c[x], na.rm = TRUE))),
    by = .(admission_group, individual)
]

dt[c > min_c_lower_prio, dominated := TRUE]

Upvotes: 1

Views: 59

Answers (1)

Frank
Frank

Reputation: 66819

Yes, it can be done with non-equi joins:

dt[, d := dt[.SD, on=.(admission_group, individual, prio < prio, c < c), mult="first", 
  .N > 0, by=.EACHI]$V1]

Alternately, sort by priority and use cummin:

dt[order(prio), d2 := c > cummin(c), by=.(admission_group, individual)]

Upvotes: 3

Related Questions