Reputation: 95
my frame looks like this
4 8
6 9
1 2
5 7
10 14
3 9
in which the first col ist the start and the other col the end of a measure. I now want to return the indices of those rows which partly overlap a specific row. Example would row 1. The indices would be 2,4,6 - as these partly overlap. I need to make this comparison very frequently so an efficient solution would be great
note that i am looking not only for partly overlap but also complete overlap (3 9) ..
Upvotes: 1
Views: 669
Reputation: 92300
Here's a possible solution using foverlaps()
function from the data.table
package.
Set column names and pick the row index:
library(data.table)
cols <- c("start", "end")
indx <- 1L
Convert your data to a data.table
object, set the column names and separate the specific row from the rest of the data and key it (this is an essential step - check ?foverlaps
for more).
setnames(setDT(df), cols)
temp <- setkeyv(df[indx], cols)
Run the foverlaps
function. You can choose which type of overlap you want in the type
parameter
foverlaps(df[-indx], temp, which=TRUE,
type="any", nomatch=0L)$xid + 1
## [1] 2 4 6
Upvotes: 2
Reputation: 13122
You could use "IRanges" package:
library(IRanges)
findOverlaps(IRanges(DF$V1, DF$V2), IRanges(DF$V1[1], DF$V2[1]))@queryHits
#[1] 1 2 4 6
Or generate all overlaps at once and subset later:
overls = findOverlaps(IRanges(DF$V1, DF$V2), ignoreSelf = TRUE)
split(subjectHits(overls), queryHits(overls))
subjectHits(overls)[queryHits(overls) == 1]
#[1] 2 4 6
"DF":
DF = structure(list(V1 = c(4L, 6L, 1L, 5L, 10L, 3L), V2 = c(8L, 9L,
2L, 7L, 14L, 9L)), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA,
-6L))
Upvotes: 2