Reputation: 53
I need to aggregate multiple months from original data with dataframe in R, e.g: data frame with datetime include 2017 and 2018.
date category amt
1 2017-08-05 A 0.1900707
2 2017-08-06 B 0.2661277
3 2017-08-07 c 0.4763196
4 2017-08-08 A 0.5183718
5 2017-08-09 B 0.3021019
6 2017-08-10 c 0.3393616
What I want is to sum based on 6 month period and category:
period category sum
1 2017_secondPeriod A 25.00972
2 2018_firstPeriod A 25.59850
3 2017_secondPeriod B 24.96924
4 2018_firstPeriod B 24.79649
5 2017_secondPeriod c 20.17096
6 2018_firstPeriod c 27.01794
What I did: 1. select the last 6 months of 2017, like wise 2018 2. add a new column for each subset to indicate the period 3. Combine 2 subset again 4. aggregate as following:
library(lubridate)
df <- data.frame(
date = today() + days(1:300),
category = c("A","B","c"),
amt = runif(300)
)
df2017_secondHalf <- subset(df, month(df$date) %in% c(7,8,9,10,11,12) & year(df$date) == 2017)
f2018_firstHalf <- subset(df, month(df$date) %in% c(1,2,3,4,5,6) & year(df$date) == 2018)
sum1 <- aggregate(df2017_secondHalf$amt, by=list(Category=df2017_secondHalf$Category), FUN=sum)
sum2 <- aggregate(df2018_firstHalf$amt, by=list(Category=df2018_secondHalf$Category), FUN=sum)
df2017_secondHalf$period <- '2017_secondPeriod'
df2018_firstHalf$period <- '2018_firstPeriod'
aggregate(x = df$amt, by = df[c("period", "category")], FUN = sum)
I try to figure out but did not know how to aggregate multple months e.g, 3 months, or 6 months.
Thanks in advance Any suggesstion?
Upvotes: 2
Views: 2184
Reputation: 5898
With lubridate and tidyverse (dplyr & magrittr)
First, let's create groups with Semesters, Quarter, and "Trimonthly".
library(tidyverse)
library(lubridate)
df <- df %>% mutate(Semester = semester(date, with_year = TRUE),
Quarter = quarter(date, with_year = TRUE),
Trimonthly = round_date(date, unit = "3 months" ))
Lubridate's semester() breaks by semsters and gives you a 1 (Jan-Jun) or 2 (Jul-Aug); quarter() does a similar thing with quarters. I add a third, the more basic round_date function, where you can specify your time frame in the form of size and time units. It yields the first date of such time frame. I deliberately name it "Trimonthly" so you can see how it compares to quarter()
Pivot.Semester <- df %>%
group_by(Semester, category) %>%
summarise(Semester.sum = sum(amt))
Pivot.Quarter <- df %>%
group_by(Quarter, category) %>%
summarise(Quarter.sum = sum(amt))
Pivot.Trimonthly <- df %>%
group_by(Trimonthly, category) %>%
summarise(Trimonthly.sum = sum(amt))
Pivot.Semester
Pivot.Quarter
Pivot.Trimonthly
Optional: If you want to join the summarised data to the original DF.
df <- df %>% left_join(Pivot.Semester, by = c("category", "Semester")) %>%
left_join(Pivot.Quarter, by = c("category", "Quarter")) %>%
left_join(Pivot.Trimonthly, by = c("category", "Trimonthly"))
df
Upvotes: 2
Reputation: 269596
Here is a 3 line solution that uses no package. Let k
be the number of months in a period. For half year periods k
is 6. For quarter year periods k
would be 3, etc. Replace 02 in the sprintf
format with 1 if you want one digit suffices (but not for monthly since those must be two digit). Further modify the sprintf
format if you want it to exactly match the question.
k <- 6
period <- with(as.POSIXlt(DF$date), sprintf("%d-%02d", year + 1900, (mon %/% k) + 1))
aggregate(amt ~ category + period, DF, sum)
giving:
category period amt
1 A 2017-02 0.7084425
2 B 2017-02 0.5682296
3 c 2017-02 0.8156812
At the expense of using one package we can simplify the quarterly and monthly calculations by replacing the formula for period
with one of these:
library(zoo)
# quarterly
period <- as.yearqtr(DF$date)
# monthly
period <- as.yearmon(DF$date)
Note: The input in reproducible form is:
Lines <- "date category amt
1 2017-08-05 A 0.1900707
2 2017-08-06 B 0.2661277
3 2017-08-07 c 0.4763196
4 2017-08-08 A 0.5183718
5 2017-08-09 B 0.3021019
6 2017-08-10 c 0.3393616"
DF <- read.table(text = Lines)
DF$date <- as.Date(DF$date)
Upvotes: 0