Nikola Knezevic
Nikola Knezevic

Reputation: 799

Count the number of values in a window of a data.table per group

I'm trying to add new columns to data.table, where values in rows depend on the relative relationship of the values in the row. To be more precise, if there is a value X in a row, I would like to know how many other values are in the same column (and group), that are within X-30.

That is, given this:

DT<-data.table(
X = c(1, 2, 2, 1, 1, 2,  1, 2, 2, 1, 1, 1),
Y = c(100, 101, 133, 134, 150, 156,  190, 200, 201, 230, 233, 234),
Z = c(1, 2, 3, 4, 5, 6,  7, 8, 9, 10, 11, 12))

I would like to get a new column, with values:

N <- c(0, 0, 0, 0, 1, 1,  0, 0, 1, 0, 1, 2)

I've tried the following, but I don't get the results I could use:

DT[,list(Y,num=cumsum(Y[-.I]>DT[.I,Y]-30),Z),by=.(X)]

Any ideas how to do this?

Upvotes: 5

Views: 460

Answers (2)

Frank
Frank

Reputation: 66819

Here's another way:

DT[order(Y), N := 0:(.N-1) - findInterval(Y - 30, Y), by = X]

all.equal(DT$N,N) # TRUE

Upvotes: 4

David Arenburg
David Arenburg

Reputation: 92300

This is probably can be achieved with a rolling join (?), but here is a foverlaps alternative for now

DT[, `:=`(indx = .I, Y2 = Y - 30L, N = 0L)] # Add row index and a -30 interval
setkey(DT, X, Y2, Y) # Sort by X and the intervals (for fovelaps)
res <- foverlaps(DT, DT)[Y2 > i.Y2, .N, keyby = indx] # Run foverlaps and check what can we catch
setorder(DT, indx) # go back to the original order
DT[res$indx, N := res$N][, c("indx", "Y2") := NULL] # update results and remove cols
DT
#     X   Y  Z N
#  1: 1 100  1 0
#  2: 2 101  2 0
#  3: 2 133  3 0
#  4: 1 134  4 0
#  5: 1 150  5 1
#  6: 2 156  6 1
#  7: 1 190  7 0
#  8: 2 200  8 0
#  9: 2 201  9 1
# 10: 1 230 10 0
# 11: 1 233 11 1
# 12: 1 234 12 2

Alternately, use the which=TRUE option of foverlaps to make the overlap merge smaller:

# as above
DT[, `:=`(indx = .I, Y2 = Y - 30L, N = 0L)]
setkey(DT, X, Y2, Y)

# using which=TRUE:
res <- foverlaps(DT, DT, which=TRUE)[xid > yid, .N, by=xid]
DT[res$xid, N := res$N]
setorder(DT, indx)
DT[, c("Y2","indx") := NULL]

Upvotes: 6

Related Questions