Reputation: 61
columnames=c("UserId","ItemName","Date","MonthTake","YearTaken")
row1=c("1" , "Apple" , "8/30/2019" , "8" , "2019")
row2=c("1" , "Apple" , "8/31/2019" , "8" , "2019")
row3=c("1" , "Apple" , "9/1/2019" , "9" , "2019")
row4=c("1" , "Apple" , "9/2/2019" , "9" , "2019")
row5=c("1" , "Banana" , "9/3/2019" , "9" , "2019")
row6=c("1" , "Banana" , "9/4/2019" , "9" , "2019")
row7=c("2" , "Banana" , "9/3/2019" , "9" , "2019")
row8=c("2" , "Banana" , "9/4/2019" , "9" , "2019")
d<-rbind(columnames,row1,row2,row3,row4,row5,row6,row7,row8)
d1<-as.data.frame(d)
header.true <- function(d1) {
names(d1) <- as.character(unlist(d1[1,]))
d1[-1,]
}
d1<-header.true(d1)
UserID ItemName Date MonthTaken YearTaken
1 Apple 8/30/219 8 2019
1 Apple 8/31/219 8 2019
1 Apple 9/1/219 9 2019
1 Apple 9/2/219 9 2019
1 Banana 9/3/219 9 2019
1 Banana 9/4/219 9 2019
2 Banana 9/3/219 9 2019
2 Banana 9/4/219 9 2019
data1%>%
group_by(UserID,ItemName,MonthTaken,YearTaken, days=c(0,cumsum(diff(Date)!=1)))%>%
summarise(Date=n())%>%
select(-days)
I am using the above code and I am able to have the consecutive days per user id , per Item Name, on the Month and year the item was used.
Like this:
UserID ItemName Date MonthTaken YearTaken
1 Apple 2 8 2019
1 Apple 2 9 2019
1 Banana 2 9 2019
2 Banana 2 9 2019
I want to remove the Month and Year groupings, but somehow include the starting date of the count instead. The goal is that if someone took something on 8/31/2019 and 9/1/2019, the Consecutive count is 2 starting in August, instead of 1 for August and 1 for September.
End goal like this :
UserID ItemName StartDate ConsecutiveDays
1 Apple 8/30/219 4
1 Banana 9/3/219 2
2 Banana 9/3/219 2
Upvotes: 1
Views: 593
Reputation: 30494
Make sure Date
is in Date
format:
d1$Date <- as.Date(d1$Date, format = "%m/%d/%Y")
Adapting what you have, would not group_by
MonthTaken
or YearTaken
, and add StartDate
to summarise
:
d1 %>%
arrange(UserID, ItemName, Date) %>%
group_by(UserID, ItemName, days = c(0, cumsum(diff(Date) != 1))) %>%
summarise(ConsecutiveDays=n(),
StartDate = first(Date))%>%
select(-days)
Edit: Added arrange
to account for possibility of dates out of order. This needs to be sorted before cumsum
in group_by
.
Output
# A tibble: 3 x 4
# Groups: UserID, ItemName [3]
UserID ItemName ConsecutiveDays StartDate
<fct> <fct> <int> <date>
1 1 Apple 4 2019-08-30
2 1 Banana 2 2019-09-03
3 2 Banana 2 2019-09-03
Upvotes: 2