Reputation: 2077
I have a data frame that has a similar structure to this
Year <- c("2000", "2001", "2002" ,"2003", "2004", "2005" ,"2006", "2007", "2008", "2009", "2010", "2011" ,"2012", "2013", "2014", "2015")
Sales <- c(2000,4800,6700,5000,7000,8000,3070,2000,1800,7100,6600,5000,6000,4200,1200,5700)
salesDF <- data.frame(Year,Sales)
The Year
column is a factor variable. I'd like to mutate a new column that has the observations in the Year column, grouped in an interval of 5 years. So that in the end, the trend of sales is in multiple of 5-year intervals.
I'd like my legend to have intervals of "2000", "2005", "2010", "2015"
How do I go about achieving this?
Upvotes: 1
Views: 566
Reputation: 5798
Base R solution:
agg_sales <- data.frame(do.call("cbind", (aggregate(. ~ Year,
within(salesDF, {Year <- floor(as.numeric(as.character(Year)) %/% 5) * 5}),
FUN = function(x) {
c(total_sales = sum(x, na.rm = TRUE), avg_sales = mean(x, na.rm = TRUE))}))))
Upvotes: -1
Reputation: 389325
You can use cut
/findInterval
to divide data in groups of 5 years.
library(dplyr)
salesDF %>%
group_by(grp = findInterval(Year, seq(min(Year), max(Year), 5))) %>%
summarise(Year = first(Year), Sales = sum(Sales)) %>%
ungroup() %>%
select(-grp)
# A tibble: 4 x 2
# Year Sales
# <chr> <dbl>
#1 2000 25500
#2 2005 21970
#3 2010 23000
#4 2015 5700
Or in data.table
library(data.table)
setDT(salesDF)[, .(Year = first(Year), Sales = sum(Sales)),
.(findInterval(Year, seq(min(Year), max(Year), 5)))]
data
Changing Year
column to numeric
salesDF$Year <- as.numeric(as.character(salesDF$Year))
Upvotes: 3
Reputation: 24888
Here's a simple approach grouping with cumsum
and modulus (%%
):
salesDF %>%
mutate(Group = cumsum(as.numeric(as.character(salesDF$Year)) %% 5 == 0)) %>%
group_by(Group) %>%
summarize(Year = first(Year), Mean = mean(Sales), Sum = sum(Sales))
# A tibble: 4 x 4
Group Year Mean Sum
<int> <fct> <dbl> <dbl>
1 1 2000 5100 25500
2 2 2005 4394 21970
3 3 2010 4600 23000
4 4 2015 5700 5700
Or as a new column without summarizing:
salesDF %>%
mutate(Group = cumsum(as.numeric(as.character(salesDF$Year)) %% 5 == 0)) %>%
group_by(Group) %>%
mutate(Mean = mean(Sales), Sum = sum(Sales))
# A tibble: 16 x 5
# Groups: Group [4]
Year Sales Group Mean Sum
<fct> <dbl> <int> <dbl> <dbl>
1 2000 2000 1 5100 25500
2 2001 4800 1 5100 25500
3 2002 6700 1 5100 25500
...
14 2013 4200 3 4600 23000
15 2014 1200 3 4600 23000
16 2015 5700 4 5700 5700
Upvotes: 6