triub
triub

Reputation: 95

find indices of overlapping ranges in R

my frame looks like this

4 8
6 9
1 2
5 7
10 14
3 9

in which the first col ist the start and the other col the end of a measure. I now want to return the indices of those rows which partly overlap a specific row. Example would row 1. The indices would be 2,4,6 - as these partly overlap. I need to make this comparison very frequently so an efficient solution would be great

note that i am looking not only for partly overlap but also complete overlap (3 9) ..

Upvotes: 1

Views: 669

Answers (2)

David Arenburg
David Arenburg

Reputation: 92300

Here's a possible solution using foverlaps() function from the data.table package.

Set column names and pick the row index:

library(data.table)
cols <- c("start", "end")
indx <- 1L

Convert your data to a data.table object, set the column names and separate the specific row from the rest of the data and key it (this is an essential step - check ?foverlaps for more).

setnames(setDT(df), cols)
temp <- setkeyv(df[indx], cols)

Run the foverlaps function. You can choose which type of overlap you want in the type parameter

foverlaps(df[-indx], temp, which=TRUE, 
          type="any", nomatch=0L)$xid + 1 
## [1] 2 4 6

Upvotes: 2

alexis_laz
alexis_laz

Reputation: 13122

You could use "IRanges" package:

library(IRanges)

findOverlaps(IRanges(DF$V1, DF$V2), IRanges(DF$V1[1], DF$V2[1]))@queryHits
#[1] 1 2 4 6

Or generate all overlaps at once and subset later:

overls = findOverlaps(IRanges(DF$V1, DF$V2), ignoreSelf = TRUE)
split(subjectHits(overls), queryHits(overls))

subjectHits(overls)[queryHits(overls) == 1]
#[1] 2 4 6

"DF":

DF = structure(list(V1 = c(4L, 6L, 1L, 5L, 10L, 3L), V2 = c(8L, 9L, 
2L, 7L, 14L, 9L)), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA, 
-6L))

Upvotes: 2

Related Questions