R: Extract ranges if they harbor values from a vector (many ranges and large vector)

Question

I have a table of ranges (start, stop), which looks something like this:

ID	start	stop
x1	351525	352525
x2	136790	136990
x3	74539	74739
x4	478181	478381
...	...	...

I also have a vector of positions.

The data can be simulated with:

s=round(runif(50,0,500000),0)
# ranges:
# (+200 is random, the difference my be more or less than that, but stop is always higher than start)
ranges=cbind(ID=paste0("x",1:50), start=s, stop=s+200) 

# positions
pos=round(runif(5000,0,500000),0)

I want to select all IDs which have at least one position within their range.

I could loop through ranges and pos:

library(dplyr)
selected.IDs <- c()
for(r in 1:nrow(ranges)){
  for(p in 1:length(pos)){
    if(between(pos[p],left = ranges[r,2], right  = ranges[r,3])){
      selected.IDs <- append(selected.IDs, ranges[r,1])
      break
    } else{next}
  }
}

That works fine (I think). However, the 'ranges' object has 83,000 rows and there are 180,000 position. It takes a long time to loop through all of them.

Does anyone has an idea how to do that without a loop?

Thanks

Paul Endymion · Accepted Answer

I usually do this using overlap joins with data.table::foverlaps.

s <- round(runif(50,0,500000),0)
# ranges:
# (+200 is random, the difference my be more or less than that, but stop is always higher than start)
ranges <- data.table(ID=paste0("x",1:50), start=s, stop=s+200) 

# positions
pos <- round(runif(5000,0,500000),0)
pos <- data.table(start = pos, stop = pos + 1)

setkey(pos, start, stop)
setkey(ranges, start, stop)

res <- foverlaps(ranges, pos, nomatch = 0)
selected.IDs <- res$ID

R: Extract ranges if they harbor values from a vector (many ranges and large vector)

Answers (1)

Related Questions