Ritika Khurana
Ritika Khurana

Reputation: 33

Using dplyr group by and mutate to sum up values based on year and month

I have a data frame that I need to group by a combination of columns entries in order to conditionally mutate several columns using only an if statement. I tried following the code posted here Using dplyr to group_by and conditionally mutate only with if (without else) statement. However, my problem is slightly different.

I want to sum data (D1) by county between the months of October of t-1 year and September of t year. Month variable = Mth. As you can see below, the sum of D1 from 10/2004 to 09/2005 is 300. My dataset is more than 100K rows for the entire US for a period of 15 years. For the purpose of demonstration, I have shown just two years for just one county. In my dataset there will be more counties, I have to address that in the code.

Year    County  St  D1  Mth Output
2005    Multon  OR  20  10  NA
2005    Multon  OR  0   10  NA
2005    Multon  OR  10  9   300
2005    Multon  OR  10  9   NA
2005    Multon  OR  20  8   NA
2005    Multon  OR  20  8   NA
2005    Multon  OR  30  7   NA
2005    Multon  OR  10  7   NA
2005    Multon  OR  0   6   NA
2005    Multon  OR  20  6   NA
2005    Multon  OR  10  5   NA
2005    Multon  OR  30  5   NA
2005    Multon  OR  0   4   NA
2005    Multon  OR  30  4   NA
2005    Multon  OR  20  3   NA
2005    Multon  OR  10  3   NA
2005    Multon  OR  30  2   NA
2005    Multon  OR  10  2   NA
2005    Multon  OR  0   1   NA
2005    Multon  OR  10  1   NA
2004    Multon  OR  0   12  NA
2004    Multon  OR  20  12  NA
2004    Multon  OR  0   11  NA
2004    Multon  OR  10  10  NA
2004    Multon  OR  10  9   80
2004    Multon  OR  20  8   NA
2004    Multon  OR  20  8   NA
2004    Multon  OR  30  7   NA

Can someone please help me with this query? I am learning R through this project.

Thanks!

Ritika

Upvotes: 0

Views: 1206

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 146154

As best as I can tell, this is what you want:

df %>% 
  mutate(
    grouper = Year + (Mth >= 10),
    grouper = paste("Oct", grouper - 1, "- Sept", grouper)
  ) %>%
  group_by(grouper, County) %>%
  mutate(Output = sum(D1))
# # A tibble: 28 × 7
# # Groups:   grouper, County [3]
#     Year County St       D1   Mth Output grouper             
#    <int> <chr>  <chr> <int> <int>  <int> <chr>               
#  1  2005 Multon OR       20    10     20 Oct 2005 - Sept 2006
#  2  2005 Multon OR        0    10     20 Oct 2005 - Sept 2006
#  3  2005 Multon OR       10     9    300 Oct 2004 - Sept 2005
#  4  2005 Multon OR       10     9    300 Oct 2004 - Sept 2005
#  5  2005 Multon OR       20     8    300 Oct 2004 - Sept 2005
#  6  2005 Multon OR       20     8    300 Oct 2004 - Sept 2005
#  7  2005 Multon OR       30     7    300 Oct 2004 - Sept 2005
#  8  2005 Multon OR       10     7    300 Oct 2004 - Sept 2005
#  9  2005 Multon OR        0     6    300 Oct 2004 - Sept 2005
# 10  2005 Multon OR       20     6    300 Oct 2004 - Sept 2005
# # … with 18 more rows

Using this sample data:

df = read.table(text = 'Year    County  St  D1  Mth Output
2005    Multon  OR  20  10  NA
2005    Multon  OR  0   10  NA
2005    Multon  OR  10  9   300
2005    Multon  OR  10  9   NA
2005    Multon  OR  20  8   NA
2005    Multon  OR  20  8   NA
2005    Multon  OR  30  7   NA
2005    Multon  OR  10  7   NA
2005    Multon  OR  0   6   NA
2005    Multon  OR  20  6   NA
2005    Multon  OR  10  5   NA
2005    Multon  OR  30  5   NA
2005    Multon  OR  0   4   NA
2005    Multon  OR  30  4   NA
2005    Multon  OR  20  3   NA
2005    Multon  OR  10  3   NA
2005    Multon  OR  30  2   NA
2005    Multon  OR  10  2   NA
2005    Multon  OR  0   1   NA
2005    Multon  OR  10  1   NA
2004    Multon  OR  0   12  NA
2004    Multon  OR  20  12  NA
2004    Multon  OR  0   11  NA
2004    Multon  OR  10  10  NA
2004    Multon  OR  10  9   80
2004    Multon  OR  20  8   NA
2004    Multon  OR  20  8   NA
2004    Multon  OR  30  7   NA', header = T)

Upvotes: 1

Related Questions