Reputation: 57
I looked up all possible questions&answers about how to find a number of counts in a column,when the start time of a variable 2 (V2
) and 3 (V3
) falls within the range of + 25 sec and - 25 sec from the start time of variable 1 (V1
).
E.g.
var start
V1 268.523
V1 296.986
V1 306.701
V1 311.586
V1 342.755
V1 358.539
V2 337.968
V2 339.808
V2 340.948
V2 357.278
V2 358.718
V3 297.936
V3 300.156
V3 307.734
V3 311.378
V3 339.046
E.g. If 1st (V1
) starts at 268.525 sec, the range of +25sec and - 25sec is from 293.523 until 243.523. If the start of V2
and V3
fits in this "time window" it should be counted as 1.
I would appreciate, if someone could give me a hint how to get the information, I from the data set.
Upvotes: 1
Views: 143
Reputation: 92302
It seems to me that you want something as the following. I've separated your data into two data sets. df1
where var == "V1"
and df2
where var != "V1"
. Then I've set +-25 ranges within df1
to match against and added a row index in order to know which row was matched against df2
afterwards. Then I've keyed both data set by matching column and ran foverlaps
in order to find overlap ranges. Finally, you can aggregate by both the index
and the matched variable names and the dcast
the data and join afterwards
You will probably need the development version on GH (v 1.9.5+), see here
library(data.table) # v 1.9.5+
df1 <- setDT(df)[var == "V1"]
df2 <- df[var != "V1"]
df1[, `:=`(from = start - 25L, to = start + 25L, indx = .I)]
setkey(df1, from, to)
df2[, end := start]
setkey(df2, start, end)
res <- foverlaps(df2, df1)[, .(start = toString(i.start), .N), by = .(indx, i.var)]
res <- dcast(res, indx ~ i.var, value.var = c("N", "start"))
setkey(df1, indx)
setkey(res, indx)[df1]
# indx N_V2 N_V3 start_V2 start_V3 var start from to
# 1: 1 NA NA NA NA V1 268.523 243.523 293.523
# 2: 2 NA 4 NA 297.936, 300.156, 307.734, 311.378 V1 296.986 271.986 321.986
# 3: 3 NA 4 NA 297.936, 300.156, 307.734, 311.378 V1 306.701 281.701 331.701
# 4: 4 NA 4 NA 297.936, 300.156, 307.734, 311.378 V1 311.586 286.586 336.586
# 5: 5 5 1 337.968, 339.808, 340.948, 357.278, 358.718 339.046 V1 342.755 317.755 367.755
# 6: 6 5 1 337.968, 339.808, 340.948, 357.278, 358.718 339.046 V1 358.539 333.539 383.539
Data
df <- structure(list(var = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("V1", "V2", "V3"
), class = "factor"), start = c(268.523, 296.986, 306.701, 311.586,
342.755, 358.539, 337.968, 339.808, 340.948, 357.278, 358.718,
297.936, 300.156, 307.734, 311.378, 339.046)), .Names = c("var",
"start"), class = "data.frame", row.names = c(NA, -16L))
Upvotes: 4