Sebastian Zeki
Sebastian Zeki

Reputation: 6874

How to merge rows when the between row date is within a specified time limit

I have some data as follows

Team    Date         Text
1        13/12/01     mytext1
2        12/01/13     mytext2
2        05/02/13     mytext3
3        14/02/14     mytext4
3        16/02/14     mytext5
3        01/03/14     mytext6
4        29/03/12     mytext7

I would like to merge rows where the difference between the Date column between two rows within a Team is <29 which should result in:

Team         Date                                          Text
1            13/12/01                                      mytext1
2,2          12/01/13,05/02/13                             mytext2,mytext3
3,3,3        14/02/14,16/02/14,16/02/14                    mytext4,mytext5,mytext6
4            29/03/12                                      mytext7

I am stuck on how to merge according to this condition. So far I have:

df%>%
  mutate(diff_days = difftime(Date , lag(Date ),units="days"))%>%
  group_by(Team,diff_days)%>%
  summarize_all(paste,collapse=", ")

but how to implement my condition of merging rows?

Upvotes: 2

Views: 372

Answers (2)

akrun
akrun

Reputation: 886948

We can also use

library(dplyr)
df %>%
 mutate(Date = as.Date(Date, '%d/%m/%y')) %>%
 group_by(Team) %>%
 group_by(grp = cumsum(Date - 
                 lag(Date, default = first(Date)) > 29), .add = TRUE) %>%
  summarise(across(c(Date, Text), toString), .groups = 'drop') %>%
  select(-grp) 

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388817

Within each Team you can create a new group whenever a difference between two consecutive dates is greater than 29. For each group collapse the Date and Text column in one comma-separated string.

library(dplyr)

df %>%
  mutate(Date = lubridate::dmy(Date)) %>%
  group_by(Team) %>%
  group_by(grp = cumsum(Date - 
                 lag(Date, default = first(Date)) > 29), .add = TRUE) %>%
  summarise(across(c(Date, Text), toString)) %>%
  select(-grp) %>%
  ungroup

#   Team Date                               Text                     
#  <int> <chr>                              <chr>                    
#1     1 2001-12-13                         mytext1                  
#2     2 2013-01-12, 2013-02-05             mytext2, mytext3         
#3     3 2014-02-14, 2014-02-16, 2014-03-01 mytext4, mytext5, mytext6
#4     4 2012-03-29                         mytext7        

Upvotes: 3

Related Questions