Jenna Davis
Jenna Davis

Reputation: 61

Count consecutive days and include column that indicates the start date of count

columnames=c("UserId","ItemName","Date","MonthTake","YearTaken")

row1=c("1" , "Apple"  ,      "8/30/2019"   , "8"  , "2019")
row2=c("1" , "Apple"  ,      "8/31/2019"   , "8"  , "2019")
row3=c("1" , "Apple"  ,      "9/1/2019"    , "9"  , "2019")
row4=c("1" , "Apple"  ,      "9/2/2019"    , "9"  , "2019")
row5=c("1" , "Banana" ,      "9/3/2019"    , "9"  , "2019")
row6=c("1" , "Banana" ,      "9/4/2019"    , "9"  , "2019")
row7=c("2" , "Banana" ,      "9/3/2019"    , "9"  , "2019")
row8=c("2" , "Banana" ,      "9/4/2019"    , "9"  , "2019")

d<-rbind(columnames,row1,row2,row3,row4,row5,row6,row7,row8)
d1<-as.data.frame(d)

header.true <- function(d1) {
  names(d1) <- as.character(unlist(d1[1,]))
  d1[-1,]
}
d1<-header.true(d1)






  UserID         ItemName     Date        MonthTaken   YearTaken 
   1             Apple        8/30/219     8            2019
   1             Apple        8/31/219     8            2019
   1             Apple        9/1/219      9            2019
   1             Apple        9/2/219      9            2019
   1             Banana       9/3/219      9            2019
   1             Banana       9/4/219      9            2019
   2             Banana       9/3/219      9            2019
   2             Banana       9/4/219      9            2019



  data1%>%
        group_by(UserID,ItemName,MonthTaken,YearTaken, days=c(0,cumsum(diff(Date)!=1)))%>%
        summarise(Date=n())%>%
        select(-days)

I am using the above code and I am able to have the consecutive days per user id , per Item Name, on the Month and year the item was used.

Like this:

UserID               ItemName     Date               MonthTaken   YearTaken 
       1             Apple        2                      8            2019

       1             Apple        2                      9            2019

       1             Banana       2                      9            2019

       2             Banana       2                      9            2019

I want to remove the Month and Year groupings, but somehow include the starting date of the count instead. The goal is that if someone took something on 8/31/2019 and 9/1/2019, the Consecutive count is 2 starting in August, instead of 1 for August and 1 for September.

End goal like this :

  UserID         ItemName     StartDate     ConsecutiveDays 
   1             Apple        8/30/219        4
   1             Banana       9/3/219         2
   2             Banana       9/3/219         2

Upvotes: 1

Views: 593

Answers (1)

Ben
Ben

Reputation: 30494

Make sure Date is in Date format:

d1$Date <- as.Date(d1$Date, format = "%m/%d/%Y")

Adapting what you have, would not group_by MonthTaken or YearTaken, and add StartDate to summarise:

d1 %>%
  arrange(UserID, ItemName, Date) %>%
  group_by(UserID, ItemName, days = c(0, cumsum(diff(Date) != 1))) %>%
  summarise(ConsecutiveDays=n(),
            StartDate = first(Date))%>%
  select(-days)

Edit: Added arrange to account for possibility of dates out of order. This needs to be sorted before cumsum in group_by.

Output

# A tibble: 3 x 4
# Groups:   UserID, ItemName [3]
  UserID ItemName ConsecutiveDays StartDate 
  <fct>  <fct>              <int> <date>    
1 1      Apple                  4 2019-08-30
2 1      Banana                 2 2019-09-03
3 2      Banana                 2 2019-09-03

Upvotes: 2

Related Questions