Reputation: 945
Here, I have a one-dimensional integer space (consist of random intervals defined by their begin and end). I would like to select consequent integer intervals with specific intra-inter length.
An integer interval means a set of consecutive increasing integers, defined by a begin integer and an end integer. Some intervals in the initial set are totally included in others or partially overlapped with others.
I describe my question using the following dummy.
(1) the data (integer space with integer intervals defined by their begin and end) I have,
integer.space <- data.frame(
begin=c(1,5,6,15,31,51,102),
end =c(7,9,13,21,49,52,108)
)
(2) what I want is to select the consequent integer intervals with intra-length of 3 and inter-length of 2. and output the selected intervals as begin and end. In this selection, I would like to select more integer intervals as most as it could be.
begin, end\n
1,3\n
6,8\n
11,13\n
16,18\n
31,33\n
36,38\n
41,43\n
46,48\n
102,104\n
Upvotes: 1
Views: 304
Reputation: 5207
I would do this in several steps:
1) Reduce the integer.space to nonoverlapping intervals.
2) Create a collection of intervals, and shift them so that they start at start points of the disjoint pieces of the integer space:
intra <- 3
inter <- 2
intervals <- data.frame(begin=seq(from=min(integer.space$begin),to=max(integer.space$end),by=intra+inter))
intervals$end <- intervals$begin + inter
for (k in 2:nrow(integer.space)) {
# overlaps the start of this component?
shift <- (intervals$begin>integer.space$end[k-1]) & (intervals$begin<integer.space$begin[k])
if (any(shift)) {
shift.ind <- min(which(shift))
intervals[shift.ind:nrow(intervals),] <- intervals[shift.ind:nrow(intervals),] + integer.space$begin[k] - intervals$begin[shift.ind]
}
}
3) Remove those that lie outside the integer space
goodbegins <- sapply(intervals$begin, function (x) {
any( (x>=integer.space$begin) & (x<=integer.space$end) )
} )
goodends <- sapply(intervals$end, function (x) {
any( (x>=integer.space$begin) & (x<=integer.space$end) )
} )
intervals <- intervals[goodbegins&goodends,]
intervals
Upvotes: 1
Reputation: 263421
Partial steps: I think you first want to define the continuous sequences. The one condition you did not put in your test case was a completely overlapped sequence.
> ints2 <- ints2[c(1:3,3,4:7),]
> ints2[4,] <- c(8,10)
require(IRanges) # from BioConductor repository
x <- IRanges(start = ints2$begin, width=1+ints2$end-ints2$begin)
asNormalIRanges(x)
#--------------
NormalIRanges of length 5
start end width
[1] 1 13 13
[2] 15 21 7
[3] 31 49 19
[4] 51 52 2
[5] 102 108 7
Further progress: To generate the sequence of 2,3,2,3,2,3... within overlapping ranges you can use:
# c(start, cumsum( rep(c(2,3), 1+(end-start)%/%5)
But then you need to trim the sequence when it "overshoots the "end":
seqcand <- c(cumsum(c(31, rep(c(2,3), 1+(49-31)%/%5))), 49)
seqcand[ 1: (min(which(seqcand > 49, arr.ind=TRUE))-1)]
# [1] 31 33 36 38 41 43 46 48
Upvotes: 1