Reputation: 14192
I have these four vectors with numbers being times in seconds:
maxtime = 240.0333
mintime = 181.1333
times1 = c(179.1333, 183.8000, 192.3000, 194.0000, 196.2500, 198.8333, 203.4333, 217.8167)
times2 = c(183.1333, 187.8000, 196.3000, 198.0000, 200.2500, 202.8333, 207.4333, 221.8167)
You will notice that times1
and times2
have the same length. Each corresponding element is 4 seconds apart. That is, times1
is 4 seconds earlier than times2
.
The best way to illustrate my questions is to plot these data like this:
library(ggplot2)
library(dplyr)
dfplot<- data.frame(ymin=times1, ymax=times2)
data.frame(x=c(rep("min",length(times1)), rep("max",length(times1))),
y=c(times1,times2),
id=1:length(times1)) %>%
ggplot(., aes(id,y,group=id)) +
geom_path(lwd=2) +
coord_flip() +
geom_hline(yintercept = as.numeric(mintime), lty=2,color='red', lwd=1)+
geom_hline(yintercept = as.numeric(maxtime), lty=2,color='red', lwd=1)+
geom_rect(data=dfplot,aes(xmin=0,ymin=ymin,xmax=length(times1),ymax=ymax,fill="red"),alpha=0.2,inherit.aes=FALSE) +
theme(panel.background = element_blank(), plot.background = element_blank())
What I want to do is to calculate the time that is covered by the intervals between each pair of elements in times1
and times2
. These are represented by the black horizontal lines and also by the red rectangles. As you can see, several of these might overlap. Effectively, I want to calculate what proportion of the time between the two red dashed lines is covered by the black lines/red rectangles and what proportion is not (i.e. the white gaps).
I hope this makes sense.
Upvotes: 2
Views: 143
Reputation: 93813
Using the GenomicRanges
library from BioConductor, with a hat tip to this previous answer: https://stackoverflow.com/a/27576114/496803
Since it only deals with integer data, you have to multiply your values out to cover the portion after the decimal point.
df <- data.frame(times1=times1*10000, times2=times2*10000, id=1)
total <- data.frame(times1=mintime*10000,times2=maxtime*10000, id=1)
#source("http://bioconductor.org/biocLite.R")
#biocLite("GenomicRanges")
library(GenomicRanges)
dfR <- makeGRangesFromDataFrame(
df, start.field="times1", end.field="times2",
seqnames.field="id"
)
totalR <- makeGRangesFromDataFrame(
total, start.field="times1", end.field="times2",
seqnames.field="id"
)
result <- intersect(totalR, dfR)
result
#GRanges object with 5 ranges and 0 metadata columns:
# seqnames ranges strand
# <Rle> <IRanges> <Rle>
# [1] 1 [1811333, 1831333] *
# [2] 1 [1838000, 1878000] *
# [3] 1 [1923000, 2028333] *
# [4] 1 [2034333, 2074333] *
# [5] 1 [2178167, 2218167] *
sum(round(as.data.frame(result)$width/10000,3))
#[1] 24.533
Upvotes: 1
Reputation: 3635
The following code seems to work for me.
The idea of the code is to conglomerate the overlapping segments into bigger segments, compute them and then subsequently calculate their lengths.
library(dplyr)
library(assertthat)
# Make sure times1 is sorted
assert_that(identical(sort(times1), times1))
# Create segments from specified times
segments <- lapply(seq_along(times1),
function(x) {
assert_that( times1[[x]] < times2[[x]] )
list(c(times1[[x]],times2[[x]]))
})
DASHED_BEGIN <- mintime
DASHED_END <- maxtime
# Cut off the segments based on dashed lines
segments_cut_off <- lapply(segments, function(xx) {
x <- xx[[1]]
# Is it within dashed interval?
if ((x[2] < DASHED_BEGIN) || (x[1] > DASHED_END))
return (NULL) # No
# Yes
(list(c(max(DASHED_BEGIN,x[1]),
min(DASHED_END,x[2]))))
}) %>% Filter(f = Negate(is.null))
# Function for determining the union of two segments
seg_union <- function(xx,yy) {
prev_x <- xx[1:length(xx)-1]
x <- xx[[length(xx)]]
y <- yy[[1]]
# Do they intersect
if (x[[2]]<y[[1]] || y[[2]] < x[[1]]) {
# No. Return each separately as well as
# attach the previous segments
return (c(prev_x,list(x,y)))
}
# Yes. Calculate the union
# (and attach the previous segments too)
(c(prev_x,
list(c(min(x[[1]],y[[1]]),
max(x[[2]],y[[2]])))
))
}
# Create the full list of conglomerated segments
union_lst <- Reduce(f = seg_union, x = segments_cut_off)
union_lst
which gives me:
[[1]]
[1] 181.1333 183.1333
[[2]]
[1] 183.8 187.8
[[3]]
[1] 192.3000 202.8333
[[4]]
[1] 203.4333 207.4333
[[5]]
[1] 217.8167 221.8167
Now we just add up their lengths
vapply(union_lst, function(x) (x[2] - x[1]),
FUN.VALUE = numeric(1)) %>%
sum
Upvotes: 3