user5264244
user5264244

Reputation: 57

Getting the number of occurances in a range R

I looked up all possible questions&answers about how to find a number of counts in a column,when the start time of a variable 2 (V2) and 3 (V3) falls within the range of + 25 sec and - 25 sec from the start time of variable 1 (V1).

E.g.

var start

V1  268.523
V1  296.986
V1  306.701
V1  311.586
V1  342.755
V1  358.539
V2  337.968
V2  339.808
V2  340.948
V2  357.278
V2  358.718
V3  297.936
V3  300.156
V3  307.734
V3  311.378
V3  339.046

E.g. If 1st (V1) starts at 268.525 sec, the range of +25sec and - 25sec is from 293.523 until 243.523. If the start of V2 and V3 fits in this "time window" it should be counted as 1.

I would appreciate, if someone could give me a hint how to get the information, I from the data set.

Upvotes: 1

Views: 143

Answers (1)

David Arenburg
David Arenburg

Reputation: 92302

It seems to me that you want something as the following. I've separated your data into two data sets. df1 where var == "V1" and df2 where var != "V1". Then I've set +-25 ranges within df1 to match against and added a row index in order to know which row was matched against df2 afterwards. Then I've keyed both data set by matching column and ran foverlaps in order to find overlap ranges. Finally, you can aggregate by both the index and the matched variable names and the dcast the data and join afterwards

You will probably need the development version on GH (v 1.9.5+), see here

library(data.table) # v 1.9.5+
df1 <- setDT(df)[var == "V1"]
df2 <- df[var != "V1"]

df1[, `:=`(from = start - 25L, to = start + 25L, indx = .I)]
setkey(df1, from, to)

df2[, end := start]
setkey(df2, start, end)

res <- foverlaps(df2, df1)[, .(start = toString(i.start), .N), by = .(indx, i.var)]
res <- dcast(res, indx ~ i.var, value.var = c("N", "start"))

setkey(df1, indx)
setkey(res, indx)[df1]

#    indx N_V2 N_V3                                    start_V2                           start_V3 var   start    from      to
# 1:    1   NA   NA                                          NA                                 NA  V1 268.523 243.523 293.523
# 2:    2   NA    4                                          NA 297.936, 300.156, 307.734, 311.378  V1 296.986 271.986 321.986
# 3:    3   NA    4                                          NA 297.936, 300.156, 307.734, 311.378  V1 306.701 281.701 331.701
# 4:    4   NA    4                                          NA 297.936, 300.156, 307.734, 311.378  V1 311.586 286.586 336.586
# 5:    5    5    1 337.968, 339.808, 340.948, 357.278, 358.718                            339.046  V1 342.755 317.755 367.755
# 6:    6    5    1 337.968, 339.808, 340.948, 357.278, 358.718                            339.046  V1 358.539 333.539 383.539

Data

df <- structure(list(var = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("V1", "V2", "V3"
), class = "factor"), start = c(268.523, 296.986, 306.701, 311.586, 
342.755, 358.539, 337.968, 339.808, 340.948, 357.278, 358.718, 
297.936, 300.156, 307.734, 311.378, 339.046)), .Names = c("var", 
"start"), class = "data.frame", row.names = c(NA, -16L))

Upvotes: 4

Related Questions