user1993617
user1993617

Reputation: 53

How to aggregate with multiple months with data frame R?

I need to aggregate multiple months from original data with dataframe in R, e.g: data frame with datetime include 2017 and 2018.

        date category       amt
  1 2017-08-05        A 0.1900707
  2 2017-08-06        B 0.2661277
  3 2017-08-07        c 0.4763196
  4 2017-08-08        A 0.5183718
  5 2017-08-09        B 0.3021019
  6 2017-08-10        c 0.3393616  

What I want is to sum based on 6 month period and category:

         period category       sum
1 2017_secondPeriod        A 25.00972
2  2018_firstPeriod        A 25.59850
3 2017_secondPeriod        B 24.96924
4  2018_firstPeriod        B 24.79649
5 2017_secondPeriod        c 20.17096
6  2018_firstPeriod        c 27.01794

What I did: 1. select the last 6 months of 2017, like wise 2018 2. add a new column for each subset to indicate the period 3. Combine 2 subset again 4. aggregate as following:

library(lubridate)
df <- data.frame(
  date = today() + days(1:300),
  category = c("A","B","c"),
  amt = runif(300)
)

df2017_secondHalf <- subset(df, month(df$date) %in% c(7,8,9,10,11,12) & year(df$date) == 2017)
f2018_firstHalf <- subset(df, month(df$date) %in% c(1,2,3,4,5,6) & year(df$date) == 2018)

sum1 <- aggregate(df2017_secondHalf$amt, by=list(Category=df2017_secondHalf$Category), FUN=sum)
sum2 <- aggregate(df2018_firstHalf$amt, by=list(Category=df2018_secondHalf$Category), FUN=sum)

df2017_secondHalf$period <- '2017_secondPeriod'
df2018_firstHalf$period <- '2018_firstPeriod'

aggregate(x = df$amt, by = df[c("period", "category")], FUN = sum)

I try to figure out but did not know how to aggregate multple months e.g, 3 months, or 6 months.

Thanks in advance Any suggesstion?

Upvotes: 2

Views: 2184

Answers (2)

Nicol&#225;s Velasquez
Nicol&#225;s Velasquez

Reputation: 5898

With lubridate and tidyverse (dplyr & magrittr)

First, let's create groups with Semesters, Quarter, and "Trimonthly".

library(tidyverse)
library(lubridate)

df <- df %>% mutate(Semester = semester(date, with_year = TRUE),
                    Quarter = quarter(date, with_year = TRUE),
                    Trimonthly = round_date(date, unit = "3 months" ))

Lubridate's semester() breaks by semsters and gives you a 1 (Jan-Jun) or 2 (Jul-Aug); quarter() does a similar thing with quarters. I add a third, the more basic round_date function, where you can specify your time frame in the form of size and time units. It yields the first date of such time frame. I deliberately name it "Trimonthly" so you can see how it compares to quarter()

Pivot.Semester <- df %>% 
    group_by(Semester, category) %>% 
    summarise(Semester.sum = sum(amt))
Pivot.Quarter <- df %>% 
    group_by(Quarter, category) %>% 
    summarise(Quarter.sum = sum(amt))
Pivot.Trimonthly <- df %>%
    group_by(Trimonthly, category) %>%
    summarise(Trimonthly.sum = sum(amt))
Pivot.Semester
Pivot.Quarter
Pivot.Trimonthly

Optional: If you want to join the summarised data to the original DF.

df <- df %>% left_join(Pivot.Semester, by = c("category", "Semester")) %>% 
    left_join(Pivot.Quarter, by = c("category", "Quarter")) %>% 
    left_join(Pivot.Trimonthly, by = c("category", "Trimonthly"))
df

Upvotes: 2

G. Grothendieck
G. Grothendieck

Reputation: 269596

Here is a 3 line solution that uses no package. Let k be the number of months in a period. For half year periods k is 6. For quarter year periods k would be 3, etc. Replace 02 in the sprintf format with 1 if you want one digit suffices (but not for monthly since those must be two digit). Further modify the sprintf format if you want it to exactly match the question.

k <- 6
period <- with(as.POSIXlt(DF$date), sprintf("%d-%02d", year + 1900, (mon %/% k) + 1))
aggregate(amt ~ category + period, DF, sum)

giving:

  category  period       amt
1        A 2017-02 0.7084425
2        B 2017-02 0.5682296
3        c 2017-02 0.8156812

At the expense of using one package we can simplify the quarterly and monthly calculations by replacing the formula for period with one of these:

library(zoo)

# quarterly
period <- as.yearqtr(DF$date)

# monthly
period <- as.yearmon(DF$date)

Note: The input in reproducible form is:

Lines <- "date category       amt
  1 2017-08-05        A 0.1900707
  2 2017-08-06        B 0.2661277
  3 2017-08-07        c 0.4763196
  4 2017-08-08        A 0.5183718
  5 2017-08-09        B 0.3021019
  6 2017-08-10        c 0.3393616"
DF <- read.table(text = Lines)
DF$date <- as.Date(DF$date)

Upvotes: 0

Related Questions