Josh
Josh

Reputation: 15

Converting daily transaction data to the sum of the end of each week

I have daily transaction data that I want to convert into a weekly sum (i.e. the sum of transactions each week for the week ending Sunday).

It is currently structured like this

dataset <- data.frame(date=as.Date(c("20200407", "20200407", "20200407", "20200407"), "%Y%m%d"), category=c("Petrol Station", "Accomodation", "Discount Store", "Shopping"), amount=c(5431.47, 839.1, 2399.13, 1305.82))

In total, there's about 100k lines over 3 years of data

I am able to create a week variable using the below (which gives me W1, W2, ...,etc.) and sum it by grouping it. However, it doesn't seem to work properly in Week 1 and Week 52 where there isn't a full 7 days

dataset$Week <- strftime(dataset$date, format = "%Y-W%V")

Any suggestions would be appreciated.

Upvotes: 1

Views: 154

Answers (3)

TobKel
TobKel

Reputation: 1453

Try this:

library(tidyverse)
library(lubridate)

dataset%>%
  mutate(week = week(date))%>%
  group_by(week)%>%
  summarize(week_sum = sum(amount))

If you want to get an etxtra "category"-evaluation than:

dataset%>%
  mutate(week = week(date))%>%
  group_by(week, category)%>%
  summarize(week_category_sum = sum(amount))

EDIT:

@TobKel Something like this - but for each individual subcategory in the category column (there is around 100).

I can generate the below fine, its just the Week 01 and Week 52 that don't have a full seven days in them

output

Upvotes: 1

rg255
rg255

Reputation: 4169

With your example using the data.table and lubridate packages:

library(lubridate)
library(data.table)
# convert data.frame to data.table
setDT(dataset)
# add week variable, then sum grouping by week and category 
dataset[, week := week(date)][, 
  wkSum = sum(amount), keyby = c("week", "category")]

Modifying your example data a little to cover multiple weeks and repeats of categories to show that it works, this input

         date       category  amount
1: 2020-04-06       Shopping 1203.54
2: 2020-04-07 Petrol Station 5431.47
3: 2020-04-07       Shopping 1305.82
4: 2020-04-08       Shopping 1400.43

Gives the output

   week       category   wkSum
1:   14 Petrol Station 5431.47
2:   14       Shopping 2509.36
3:   15       Shopping 1400.43

Upvotes: 1

Cya
Cya

Reputation: 367

@TobKei's contribution is great you just need to change a bit the structure of your dataset to achieve what you need. Essentially, you need to have each category's values as a variable to be able to generate a weekly sum without it being one single number. You may want to try the following:

library(tidyverse)
library(dplyr)
library(lubridate)

sum_output <- dataset %>%
  mutate(week = week(date)) %>%
  spread(., category, amount) %>%
  group_by(week) %>%
  summarise(accomodation_week_sum = sum(Accomodation), 
         discount_store_week_sum= sum(`Discount Store`), 
         Petrol_week_sum=sum(`Petrol Station`), 
         shopping_week_sum= sum(Shopping))

For the moment it's the same number since you don't have more observations from the same week but eventually if you had more the group_by() would return you one sum per week.

UPDATE: The following should work in case you have hundreds of categories that you obviously don't want to be typing individually:

library(tidyverse)
library(dplyr)
library(lubridate)

sum_output <- dataset %>%
  mutate(week = week(date)) %>%
  split(.$category) %>%
  lapply(., group_by, week) %>%
  lapply(., summarise, week_sum_amount=sum(amount)) %>%
  bind_rows(., .id = "week_sum_amount") %>% 
  data.frame(check.names = FALSE)

The main addition is the split() which will separate all your categories in different dataframes within one list. A list of dataframes.

Upvotes: 0

Related Questions