jianfeng.mao
jianfeng.mao

Reputation: 945

selecting integer intervals with specific intra-inter length from random integer intervals

Here, I have a one-dimensional integer space (consist of random intervals defined by their begin and end). I would like to select consequent integer intervals with specific intra-inter length.

An integer interval means a set of consecutive increasing integers, defined by a begin integer and an end integer. Some intervals in the initial set are totally included in others or partially overlapped with others.

I describe my question using the following dummy.

(1) the data (integer space with integer intervals defined by their begin and end) I have,

integer.space <- data.frame(
                     begin=c(1,5,6,15,31,51,102), 
                     end  =c(7,9,13,21,49,52,108)
                 )

(2) what I want is to select the consequent integer intervals with intra-length of 3 and inter-length of 2. and output the selected intervals as begin and end. In this selection, I would like to select more integer intervals as most as it could be.

begin, end\n
1,3\n
6,8\n
11,13\n
16,18\n
31,33\n
36,38\n
41,43\n
46,48\n
102,104\n

Upvotes: 1

Views: 304

Answers (2)

petrelharp
petrelharp

Reputation: 5207

I would do this in several steps:

1) Reduce the integer.space to nonoverlapping intervals.

2) Create a collection of intervals, and shift them so that they start at start points of the disjoint pieces of the integer space:

intra <- 3
inter <- 2
intervals <- data.frame(begin=seq(from=min(integer.space$begin),to=max(integer.space$end),by=intra+inter))
intervals$end <- intervals$begin + inter
for (k in 2:nrow(integer.space)) {
  # overlaps the start of this component?
  shift <- (intervals$begin>integer.space$end[k-1]) & (intervals$begin<integer.space$begin[k]) 
  if (any(shift)) {
    shift.ind <- min(which(shift))
    intervals[shift.ind:nrow(intervals),] <- intervals[shift.ind:nrow(intervals),] + integer.space$begin[k] - intervals$begin[shift.ind]
  }
}

3) Remove those that lie outside the integer space

goodbegins <- sapply(intervals$begin, function (x) { 
    any( (x>=integer.space$begin) & (x<=integer.space$end) )
  } )
goodends <- sapply(intervals$end, function (x) { 
    any( (x>=integer.space$begin) & (x<=integer.space$end) )
  } )
intervals <- intervals[goodbegins&goodends,]

intervals

Upvotes: 1

IRTFM
IRTFM

Reputation: 263421

Partial steps: I think you first want to define the continuous sequences. The one condition you did not put in your test case was a completely overlapped sequence.

> ints2 <- ints2[c(1:3,3,4:7),]
> ints2[4,] <- c(8,10)

require(IRanges) # from BioConductor repository
x <- IRanges(start = ints2$begin, width=1+ints2$end-ints2$begin)
asNormalIRanges(x)
#--------------
NormalIRanges of length 5
    start end width
[1]     1  13    13
[2]    15  21     7
[3]    31  49    19
[4]    51  52     2
[5]   102 108     7

Further progress: To generate the sequence of 2,3,2,3,2,3... within overlapping ranges you can use:

# c(start, cumsum( rep(c(2,3), 1+(end-start)%/%5)

But then you need to trim the sequence when it "overshoots the "end":

seqcand <- c(cumsum(c(31, rep(c(2,3), 1+(49-31)%/%5))), 49)
seqcand[ 1: (min(which(seqcand > 49, arr.ind=TRUE))-1)]
# [1] 31 33 36 38 41 43 46 48

Upvotes: 1

Related Questions