Kat
Kat

Reputation: 33

Counting sequential dates in R to determine the length of an event

I have a dataframe containing dates when a given event occurred. Some events go on for several days, and I want to summarise each event based on its start date and its total length (in days).

I want to go from this:

Date
2020-01-01
2020-01-02
2020-01-03
2020-01-15
2020-01-20
2020-01-21

To this:

StartDate EventLength
2020-01-01 3
2020-01-15 1
2020-01-20 2

I've tried various approaches with aggregate, ave, seq_along and lag, but I haven't managed to get a count of event length that resets when the dates aren't sequential.

Code for the example data frame in case it's helpful:

Date <- c("2020-01-01", "2020-01-02", "2020-01-03", "2020-01-15", "2020-01-20", "2020-01-21")
df <- data.frame(Date)
df$Date <- as.Date(df$Date, origin = "1970-01-01") 

Upvotes: 3

Views: 387

Answers (3)

GKi
GKi

Reputation: 39717

You can split by cumsum(c(0, diff(df$Date) != 1) and then take the first date and combine it with the length assuming the dates are sorted.

do.call(rbind, lapply(split(df$Date, cumsum(c(0, diff(df$Date) != 1))),
  function(x) data.frame(StartDate=x[1], EventLength=length(x))))
#   StartDate EventLength
#0 2020-01-01           3
#1 2020-01-15           1
#2 2020-01-20           2

or another option using rle:

i <- cumsum(c(0, diff(df$Date) != 1))
data.frame(StartDate = df$Date[c(1, diff(i)) == 1], EventLength=rle(i)$lengths)
#   StartDate EventLength
#1 2020-01-01           3
#2 2020-01-15           1
#3 2020-01-20           2

Upvotes: 3

Rui Barradas
Rui Barradas

Reputation: 76651

Here is a base R solution with a cumsum trick followed by ave/table.

d <- c(0, diff(df$Date) != 1)
res <- ave(df$Date, cumsum(d), FUN = function(x) x[1])
res <- as.data.frame(table(a))
names(res) <- c("Date", "EventLength")

res
#        Date EventLength
#1 2020-01-01           3
#2 2020-01-15           1
#3 2020-01-20           2

Upvotes: 2

AnilGoyal
AnilGoyal

Reputation: 26238

I propose dplyr approach which is incidentally very similar to @Rui's approach

df %>% mutate(dummy = c(0, diff(Date))) %>%
  group_by(grp = cumsum(dummy != 1)) %>%
  summarise(Date = first(Date),
            event_count = n(), .groups = 'drop')

# A tibble: 3 x 3
    grp Date       event_count
  <int> <date>           <int>
1     1 2020-01-01           3
2     2 2020-01-15           1
3     3 2020-01-20           2

Upvotes: 2

Related Questions