Reputation: 15
I have daily transaction data that I want to convert into a weekly sum (i.e. the sum of transactions each week for the week ending Sunday).
It is currently structured like this
dataset <- data.frame(date=as.Date(c("20200407", "20200407", "20200407", "20200407"), "%Y%m%d"), category=c("Petrol Station", "Accomodation", "Discount Store", "Shopping"), amount=c(5431.47, 839.1, 2399.13, 1305.82))
amount - is the spend.
date category amount
2020-04-07 Petrol Station 5431.47
2020-04-07 Accommodation 839.1
2020-04-07 Discount Store 2399.13
2020-04-07 Shopping 1305.82
In total, there's about 100k lines over 3 years of data
I am able to create a week variable using the below (which gives me W1, W2, ...,etc.) and sum it by grouping it. However, it doesn't seem to work properly in Week 1 and Week 52 where there isn't a full 7 days
dataset$Week <- strftime(dataset$date, format = "%Y-W%V")
Any suggestions would be appreciated.
Upvotes: 1
Views: 154
Reputation: 1453
Try this:
library(tidyverse)
library(lubridate)
dataset%>%
mutate(week = week(date))%>%
group_by(week)%>%
summarize(week_sum = sum(amount))
If you want to get an etxtra "category"-evaluation than:
dataset%>%
mutate(week = week(date))%>%
group_by(week, category)%>%
summarize(week_category_sum = sum(amount))
EDIT:
@TobKel Something like this - but for each individual subcategory in the category column (there is around 100).
I can generate the below fine, its just the Week 01 and Week 52 that don't have a full seven days in them
Upvotes: 1
Reputation: 4169
With your example using the data.table and lubridate packages:
library(lubridate)
library(data.table)
# convert data.frame to data.table
setDT(dataset)
# add week variable, then sum grouping by week and category
dataset[, week := week(date)][,
wkSum = sum(amount), keyby = c("week", "category")]
Modifying your example data a little to cover multiple weeks and repeats of categories to show that it works, this input
date category amount
1: 2020-04-06 Shopping 1203.54
2: 2020-04-07 Petrol Station 5431.47
3: 2020-04-07 Shopping 1305.82
4: 2020-04-08 Shopping 1400.43
Gives the output
week category wkSum
1: 14 Petrol Station 5431.47
2: 14 Shopping 2509.36
3: 15 Shopping 1400.43
Upvotes: 1
Reputation: 367
@TobKei's contribution is great you just need to change a bit the structure of your dataset to achieve what you need. Essentially, you need to have each category's values as a variable to be able to generate a weekly sum without it being one single number. You may want to try the following:
library(tidyverse)
library(dplyr)
library(lubridate)
sum_output <- dataset %>%
mutate(week = week(date)) %>%
spread(., category, amount) %>%
group_by(week) %>%
summarise(accomodation_week_sum = sum(Accomodation),
discount_store_week_sum= sum(`Discount Store`),
Petrol_week_sum=sum(`Petrol Station`),
shopping_week_sum= sum(Shopping))
For the moment it's the same number since you don't have more observations from the same week but eventually if you had more the group_by()
would return you one sum per week.
UPDATE: The following should work in case you have hundreds of categories that you obviously don't want to be typing individually:
library(tidyverse)
library(dplyr)
library(lubridate)
sum_output <- dataset %>%
mutate(week = week(date)) %>%
split(.$category) %>%
lapply(., group_by, week) %>%
lapply(., summarise, week_sum_amount=sum(amount)) %>%
bind_rows(., .id = "week_sum_amount") %>%
data.frame(check.names = FALSE)
The main addition is the split()
which will separate all your categories in different dataframes within one list. A list of dataframes.
Upvotes: 0