Date and sum of name grouping in R

Question

So this is the dataframe I'm working with

CAND_NAME   TRANSACTION_DT   TRANSACTION_AMT
K            10312017            100 
K            11032017            500
K            11032017            100 
K            11102017            200 
K            11112017            200 
R            11032016            200 
R            11032017            100 
R            11042017            500

I wanted to group the TRANSACTION_DT into 7 days by candidate. In other words, 10292017 (or 10/29/2017), 10302017, 10312017, 11012017, 11022017, 11032017, and 11042017 would be grouped and the transaction amount for that group will be summed up.

I wanted this to give:

CAND_NAME   TRANSACTION_WK_START   TRANSACTION_AMT
K            10292017            700 
K            11052017            400 
R            10302016            200 
R            10292017            600

Note for R, the date 11032016 and 11032017 are different, and I don't want them to be grouped together.

So far I have

df.new <- df %>%
  group_by(TRANSACTION_DT,CAND_NAME) %>%
  summarize(TRANSACTION_AMT = sum(TRANSACTION_AMT))

Dave2e · Accepted Answer

Here is a potential solution. Covert Transaction_dt to a date and then create a sequence of weeks. In this case I am using the earliest date in the data. This might be a problem since it may not start at the beginning of the week. If so, it would be a matter of defining the sequence to start from a known acceptable beginning. Then it is just a matter of cut and grouping.

df<-structure(list(CAND_NAME = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 
            2L, 2L), .Label = c("K", "R"), class = "factor"), TRANSACTION_DT = c(10312017L, 
             11032017L, 11032017L, 11102017L, 11112017L, 11032016L, 11032017L, 
             11042017L), TRANSACTION_AMT = c(100L, 500L, 100L, 200L, 200L, 
              200L, 100L, 500L)), .Names = c("CAND_NAME", "TRANSACTION_DT", 
              "TRANSACTION_AMT"), class = "data.frame", row.names = c(NA, -8L))

#convert to dates
df$date<-as.Date(as.character(df$TRANSACTION_DT), format="%m%d%Y")
#define the breaks
breaks<-seq(from=min(df$date), to=max(df$date)+7, by = "week")
#define the cut points
df$breaks<-cut(df$date, breaks)

library(dplyr)
df %>%
  group_by(breaks,CAND_NAME) %>%
  summarize(TRANSACTION_AMT = sum(TRANSACTION_AMT))



#     breaks CAND_NAME TRANSACTION_AMT
#                     
#1 2016-11-03         R             200
#2 2017-10-26         K             100
#3 2017-11-02         K             600
#4 2017-11-02         R             600
#5 2017-11-09         K             400

Date and sum of name grouping in R

Answers (1)

Related Questions