jalapic
jalapic

Reputation: 14192

Calculate time utilized from interval data

I have these four vectors with numbers being times in seconds:

maxtime = 240.0333
mintime = 181.1333
times1 = c(179.1333, 183.8000, 192.3000, 194.0000, 196.2500, 198.8333, 203.4333, 217.8167)
times2 = c(183.1333, 187.8000, 196.3000, 198.0000, 200.2500, 202.8333, 207.4333, 221.8167)

You will notice that times1 and times2 have the same length. Each corresponding element is 4 seconds apart. That is, times1 is 4 seconds earlier than times2.

The best way to illustrate my questions is to plot these data like this:

library(ggplot2)
library(dplyr)
dfplot<- data.frame(ymin=times1, ymax=times2) 
data.frame(x=c(rep("min",length(times1)), rep("max",length(times1))), 
           y=c(times1,times2), 
           id=1:length(times1)) %>%
  ggplot(., aes(id,y,group=id)) + 
  geom_path(lwd=2) +
  coord_flip() +
  geom_hline(yintercept = as.numeric(mintime), lty=2,color='red', lwd=1)+
  geom_hline(yintercept = as.numeric(maxtime), lty=2,color='red', lwd=1)+
geom_rect(data=dfplot,aes(xmin=0,ymin=ymin,xmax=length(times1),ymax=ymax,fill="red"),alpha=0.2,inherit.aes=FALSE) +
  theme(panel.background = element_blank(), plot.background = element_blank())

enter image description here

What I want to do is to calculate the time that is covered by the intervals between each pair of elements in times1 and times2. These are represented by the black horizontal lines and also by the red rectangles. As you can see, several of these might overlap. Effectively, I want to calculate what proportion of the time between the two red dashed lines is covered by the black lines/red rectangles and what proportion is not (i.e. the white gaps).

I hope this makes sense.

Upvotes: 2

Views: 143

Answers (2)

thelatemail
thelatemail

Reputation: 93813

Using the GenomicRanges library from BioConductor, with a hat tip to this previous answer: https://stackoverflow.com/a/27576114/496803

Since it only deals with integer data, you have to multiply your values out to cover the portion after the decimal point.

df <- data.frame(times1=times1*10000, times2=times2*10000, id=1)
total <- data.frame(times1=mintime*10000,times2=maxtime*10000, id=1)

#source("http://bioconductor.org/biocLite.R")
#biocLite("GenomicRanges")
library(GenomicRanges)

dfR <- makeGRangesFromDataFrame(
 df, start.field="times1", end.field="times2", 
 seqnames.field="id"
)

totalR <- makeGRangesFromDataFrame(
 total, start.field="times1", end.field="times2", 
 seqnames.field="id"
)

result <- intersect(totalR, dfR)
result
#GRanges object with 5 ranges and 0 metadata columns:
#      seqnames             ranges strand
#         <Rle>          <IRanges>  <Rle>
#  [1]        1 [1811333, 1831333]      *
#  [2]        1 [1838000, 1878000]      *
#  [3]        1 [1923000, 2028333]      *
#  [4]        1 [2034333, 2074333]      *
#  [5]        1 [2178167, 2218167]      *

sum(round(as.data.frame(result)$width/10000,3))
#[1] 24.533

Upvotes: 1

akhmed
akhmed

Reputation: 3635

The following code seems to work for me.

The idea of the code is to conglomerate the overlapping segments into bigger segments, compute them and then subsequently calculate their lengths.

library(dplyr)
library(assertthat)

# Make sure times1 is sorted
assert_that(identical(sort(times1), times1))

# Create segments from specified times
segments <- lapply(seq_along(times1),
                   function(x) {
                     assert_that( times1[[x]] < times2[[x]] )
                     list(c(times1[[x]],times2[[x]]))
                    })

DASHED_BEGIN <- mintime
DASHED_END <-  maxtime

# Cut off the segments based on dashed lines
segments_cut_off <- lapply(segments, function(xx) {
  x <- xx[[1]]

  # Is it within dashed interval?
  if ((x[2] < DASHED_BEGIN) || (x[1] > DASHED_END))
    return (NULL) # No

  # Yes
  (list(c(max(DASHED_BEGIN,x[1]),
              min(DASHED_END,x[2]))))

}) %>% Filter(f = Negate(is.null))

# Function for determining the union of two segments
seg_union <- function(xx,yy) {
  prev_x <- xx[1:length(xx)-1]
  x <- xx[[length(xx)]]
  y <- yy[[1]]

  # Do they intersect
  if (x[[2]]<y[[1]] || y[[2]] < x[[1]]) {
    # No. Return each separately as well as 
    # attach the previous segments
    return (c(prev_x,list(x,y)))
  }

  # Yes. Calculate the union 
  # (and attach the previous segments too)
  (c(prev_x,
     list(c(min(x[[1]],y[[1]]), 
            max(x[[2]],y[[2]])))
  ))

}

# Create the full list of conglomerated segments
union_lst <- Reduce(f = seg_union, x = segments_cut_off)

union_lst

which gives me:

[[1]]
[1] 181.1333 183.1333

[[2]]
[1] 183.8 187.8

[[3]]
[1] 192.3000 202.8333

[[4]]
[1] 203.4333 207.4333

[[5]]
[1] 217.8167 221.8167

Now we just add up their lengths

vapply(union_lst, function(x) (x[2] - x[1]), 
                  FUN.VALUE = numeric(1)) %>%
   sum

Upvotes: 3

Related Questions