Nathaniel Saxe
Nathaniel Saxe

Reputation: 1555

do these number ranges overlap each other

I've been using two different target prediction programs to predict binding sites on genes and using R to process the results that i get

The problem is that the programs give different number of targets per gene and the locations are slightly different. What i was trying to do was to see if these sites are the same, or at least, if I have the Start position and the Stop position, do these ranges overlap between programs.

Say I have two programs X and Y;

X predicts two sites, x1 is the start positions for both sites, x2 is the stop position. Same for y

x1<-c(1521,1259)
x2<-c(1544,1282)

y1<-c(1825,1522,1259,362)
y2<-c(1848,1543,1282,384)

So both of the X sites overlap sites in the Y. And output those positions in a table:

|   x1     |   x2     |   y1     |   y2     |

|   1521   |   1544   |   1522   |   1543   |
|   1259   |   1282   |   y1259  |   1282   |

What I was originally thinking, was that if I only had one site for each program, then doing the following will tell me if they overlap or not. (the stop posiiton of y, should be larger than the start position x and stop position of x is larger than y)

x1 <= y2 && y1 <= x2

I'm not sure how I could do the same for my problem, at least, not without writing a lot of loops and ifs.

Upvotes: 0

Views: 62

Answers (1)

Martin Morgan
Martin Morgan

Reputation: 46876

The IRanges package (and GenomicRanges for genomic data, when chromosome and possibly strand are important) allows you to define ranges

library(IRanges)
x <- IRanges(x1, x2)
y <- IRanges(y1, y2)

and ask questions about them

y %over% x     # any type of overlap
y %within% x   # strictly within

see ?findOverlaps for more detail, the package vignettes (from the landing pages, above), these publications a, b for a general introduction, and the Bioconductor support site if the ranges infrastructure seems useful.

Upvotes: 1

Related Questions