Ian Chang
Ian Chang

Reputation: 122

dplyr group data within N days by id

I would like to group data within N days by id.

library(tidyverse)
library(lubridate)

df<- tribble(
~id,~date,
"A","2014-07-19",
"A","2014-07-20",
"A","2014-07-21",
"B","2014-10-01",
"B","2014-10-03",
"B","2014-10-08",
"B","2014-10-10",
"B","2014-10-13",
"B","2014-10-17",
"B","2014-11-02",
"B","2014-11-04",
"B","2014-11-06",
"C","2013-03-27",
"C","2013-03-28",
"C","2013-04-01",
"C","2013-04-03",
"C","2013-04-05",
"C","2013-04-07",
"C","2013-05-27",
"C","2013-05-29",
"D","2015-01-09",
"D","2015-01-12",
"D","2015-01-14",
"D","2015-01-16"
) %>% mutate_at(vars(date),funs(ymd(.)))

The desired output would be looks like this

id  first_date  last_date    Count  
A   2014-07-19  2014-07-21   3      
B   2014-10-01  2014-10-03   2      
B   2014-10-08  2014-10-13   3      
B   2014-10-17  2014-10-17   1      
B   2014-11-02  2014-11-06   3      
C   2013-03-27  2013-03-28   2      
C   2013-04-01  2013-04-07   4      
C   2013-05-27  2013-05-29   2      
D   2015-01-09  2015-01-16   4

My solution was:

df %>% group_by(id) %>% 
mutate(diff=as.numeric(date-lag(date,default=first(date)))) %>% 
mutate(diff=if_else(diff>3,0,diff)) %>% 
mutate(rank=min_rank(cumsum(diff=='0'))) %>% 
group_by(id,rank) %>% 
summarise(first_date=min(date), last_date=max(date), Count=length(id)) %>% 
data.frame()

My solution can work, but I would like to know is there any simpler/elegant way to archive this? such as removing temp columns like diff and rank.

Upvotes: 1

Views: 192

Answers (1)

thelatemail
thelatemail

Reputation: 93813

I think you could just do the group_by calculations in-line:

df %>%
    group_by(id, counter=cumsum(c(FALSE,diff(date)>3))) %>%
    summarise(first_date = first(date), last_date = last(date), count = n()) %>%
    select(-counter)

## A tibble: 9 x 4
## Groups:   id [4]
#  id    first_date last_date  count
#  <chr> <date>     <date>     <int>
#1 A     2014-07-19 2014-07-21     3
#2 B     2014-10-01 2014-10-03     2
#3 B     2014-10-08 2014-10-13     3
#4 B     2014-10-17 2014-10-17     1
#5 B     2014-11-02 2014-11-06     3
#6 C     2013-03-27 2013-03-28     2
#7 C     2013-04-01 2013-04-07     4
#8 C     2013-05-27 2013-05-29     2
#9 D     2015-01-09 2015-01-16     4

Upvotes: 2

Related Questions