Reputation: 33
I have a data frame that I need to group by a combination of columns entries in order to conditionally mutate several columns using only an if statement. I tried following the code posted here Using dplyr to group_by and conditionally mutate only with if (without else) statement. However, my problem is slightly different.
I want to sum data (D1) by county between the months of October of t-1 year and September of t year. Month variable = Mth. As you can see below, the sum of D1 from 10/2004 to 09/2005 is 300. My dataset is more than 100K rows for the entire US for a period of 15 years. For the purpose of demonstration, I have shown just two years for just one county. In my dataset there will be more counties, I have to address that in the code.
Year County St D1 Mth Output
2005 Multon OR 20 10 NA
2005 Multon OR 0 10 NA
2005 Multon OR 10 9 300
2005 Multon OR 10 9 NA
2005 Multon OR 20 8 NA
2005 Multon OR 20 8 NA
2005 Multon OR 30 7 NA
2005 Multon OR 10 7 NA
2005 Multon OR 0 6 NA
2005 Multon OR 20 6 NA
2005 Multon OR 10 5 NA
2005 Multon OR 30 5 NA
2005 Multon OR 0 4 NA
2005 Multon OR 30 4 NA
2005 Multon OR 20 3 NA
2005 Multon OR 10 3 NA
2005 Multon OR 30 2 NA
2005 Multon OR 10 2 NA
2005 Multon OR 0 1 NA
2005 Multon OR 10 1 NA
2004 Multon OR 0 12 NA
2004 Multon OR 20 12 NA
2004 Multon OR 0 11 NA
2004 Multon OR 10 10 NA
2004 Multon OR 10 9 80
2004 Multon OR 20 8 NA
2004 Multon OR 20 8 NA
2004 Multon OR 30 7 NA
Can someone please help me with this query? I am learning R through this project.
Thanks!
Ritika
Upvotes: 0
Views: 1206
Reputation: 146154
As best as I can tell, this is what you want:
df %>%
mutate(
grouper = Year + (Mth >= 10),
grouper = paste("Oct", grouper - 1, "- Sept", grouper)
) %>%
group_by(grouper, County) %>%
mutate(Output = sum(D1))
# # A tibble: 28 × 7
# # Groups: grouper, County [3]
# Year County St D1 Mth Output grouper
# <int> <chr> <chr> <int> <int> <int> <chr>
# 1 2005 Multon OR 20 10 20 Oct 2005 - Sept 2006
# 2 2005 Multon OR 0 10 20 Oct 2005 - Sept 2006
# 3 2005 Multon OR 10 9 300 Oct 2004 - Sept 2005
# 4 2005 Multon OR 10 9 300 Oct 2004 - Sept 2005
# 5 2005 Multon OR 20 8 300 Oct 2004 - Sept 2005
# 6 2005 Multon OR 20 8 300 Oct 2004 - Sept 2005
# 7 2005 Multon OR 30 7 300 Oct 2004 - Sept 2005
# 8 2005 Multon OR 10 7 300 Oct 2004 - Sept 2005
# 9 2005 Multon OR 0 6 300 Oct 2004 - Sept 2005
# 10 2005 Multon OR 20 6 300 Oct 2004 - Sept 2005
# # … with 18 more rows
Using this sample data:
df = read.table(text = 'Year County St D1 Mth Output
2005 Multon OR 20 10 NA
2005 Multon OR 0 10 NA
2005 Multon OR 10 9 300
2005 Multon OR 10 9 NA
2005 Multon OR 20 8 NA
2005 Multon OR 20 8 NA
2005 Multon OR 30 7 NA
2005 Multon OR 10 7 NA
2005 Multon OR 0 6 NA
2005 Multon OR 20 6 NA
2005 Multon OR 10 5 NA
2005 Multon OR 30 5 NA
2005 Multon OR 0 4 NA
2005 Multon OR 30 4 NA
2005 Multon OR 20 3 NA
2005 Multon OR 10 3 NA
2005 Multon OR 30 2 NA
2005 Multon OR 10 2 NA
2005 Multon OR 0 1 NA
2005 Multon OR 10 1 NA
2004 Multon OR 0 12 NA
2004 Multon OR 20 12 NA
2004 Multon OR 0 11 NA
2004 Multon OR 10 10 NA
2004 Multon OR 10 9 80
2004 Multon OR 20 8 NA
2004 Multon OR 20 8 NA
2004 Multon OR 30 7 NA', header = T)
Upvotes: 1