Reputation: 310
Let's say I have a set of, partly overlapping, intervals
require(lubridate)
date1 <- as.POSIXct("2000-03-08 01:59:59")
date2 <- as.POSIXct("2001-02-29 12:00:00")
date3 <- as.POSIXct("1999-03-08 01:59:59")
date4 <- as.POSIXct("2002-02-29 12:00:00")
date5 <- as.POSIXct("2000-03-08 01:59:59")
date6 <- as.POSIXct("2004-02-29 12:00:00")
int1 <- new_interval(date1, date2)
int2 <- new_interval(date3, date4)
int3 <- new_interval(date5, date6)
Does anyone have an idea how one could construct a time series plot that provides, for every point in time, the number of overlapping intervals at that point?
So, for instance, to take the above example: For a given date in January 2000, the function I'm looking for would return the value "1" (the date is only within int2
) while for a date in January 2001, it would return "3" (since that date is within int1
, int2
and int3
). Etc.
Any ideas?
Upvotes: 2
Views: 516
Reputation: 118799
Here's one way using foverlaps()
function using data.table
package:
Please install the development version 1.9.5 by following the installation instructions as a bug that affects overlap joins on numeric types has been fixed there.
require(data.table) ## 1.9.5+
intervals = data.table(start = c(date1, date3, date5),
end = c(date2, date4, date6))
# assuming your query is:
query = as.POSIXct(c("2000-01-01 00:00:00", "2001-01-01 00:00:00"))
We'll construct the query data.table with both start and end intervals as well:
querydt = data.table(start=query, end=query) # identical start,end
Then we can use foverlaps()
as follows:
setkeyv(intervals, c("start", "end"))
ans = foverlaps(querydt, intervals, which=TRUE, nomatch=0L, type="within")
# xid yid
# 1: 1 1
# 2: 2 1
# 3: 2 2
# 4: 2 3
We first set key - which sorts the data.table intervals
by the columns provided, in increasing order, and marks those columns as the key columns on which we want to perform the overlap join.
Then we use foverlaps()
to find which intervals in querydt
overlaps (falls type=within) with intervals
. In this case, querydt
consists of just points as start and end points are identical. This returns all matching indices (nomatch=0L removes all rows with no matches and which=TRUE returns indices instead of merged result) for those rows in querydt
that falls within intervals
.
Now all we have to do is to aggregate by xid
and count the number of observations to get the count:
ans[, .N, by=xid]
# xid N
# 1: 1 1
# 2: 2 3
Check ?foverlaps
for more info.
Upvotes: 5