andrew
andrew

Reputation: 2077

'Grouping' Factor Observations in R

I have a data frame that has a similar structure to this

Year <- c("2000", "2001", "2002" ,"2003", "2004", "2005" ,"2006", "2007", "2008", "2009", "2010", "2011" ,"2012", "2013", "2014", "2015")
Sales <- c(2000,4800,6700,5000,7000,8000,3070,2000,1800,7100,6600,5000,6000,4200,1200,5700)
salesDF <- data.frame(Year,Sales)

The Year column is a factor variable. I'd like to mutate a new column that has the observations in the Year column, grouped in an interval of 5 years. So that in the end, the trend of sales is in multiple of 5-year intervals.

I'd like my legend to have intervals of "2000", "2005", "2010", "2015"

How do I go about achieving this?

Upvotes: 1

Views: 566

Answers (3)

hello_friend
hello_friend

Reputation: 5798

Base R solution:

agg_sales <- data.frame(do.call("cbind", (aggregate(. ~ Year, 
                       within(salesDF, {Year <- floor(as.numeric(as.character(Year)) %/% 5) * 5}),
                       FUN = function(x) {
                         c(total_sales = sum(x, na.rm = TRUE), avg_sales = mean(x, na.rm = TRUE))}))))

Upvotes: -1

Ronak Shah
Ronak Shah

Reputation: 389325

You can use cut/findInterval to divide data in groups of 5 years.

library(dplyr)

salesDF %>%
  group_by(grp = findInterval(Year, seq(min(Year), max(Year), 5))) %>%
  summarise(Year = first(Year), Sales = sum(Sales)) %>%
  ungroup() %>%
  select(-grp)

# A tibble: 4 x 2
#  Year  Sales
#  <chr> <dbl>
#1 2000  25500
#2 2005  21970
#3 2010  23000
#4 2015   5700

Or in data.table

library(data.table)
setDT(salesDF)[, .(Year = first(Year), Sales = sum(Sales)), 
                  .(findInterval(Year, seq(min(Year), max(Year), 5)))]

data

Changing Year column to numeric

salesDF$Year <- as.numeric(as.character(salesDF$Year))

Upvotes: 3

Ian Campbell
Ian Campbell

Reputation: 24888

Here's a simple approach grouping with cumsum and modulus (%%):

salesDF %>% 
  mutate(Group = cumsum(as.numeric(as.character(salesDF$Year)) %% 5 == 0)) %>%
  group_by(Group) %>%
  summarize(Year = first(Year), Mean = mean(Sales), Sum = sum(Sales))
# A tibble: 4 x 4
  Group Year   Mean   Sum
  <int> <fct> <dbl> <dbl>
1     1 2000   5100 25500
2     2 2005   4394 21970
3     3 2010   4600 23000
4     4 2015   5700  5700

Or as a new column without summarizing:

salesDF %>% 
  mutate(Group = cumsum(as.numeric(as.character(salesDF$Year)) %% 5 == 0)) %>%
  group_by(Group) %>%
  mutate(Mean = mean(Sales), Sum = sum(Sales))
# A tibble: 16 x 5
# Groups:   Group [4]
   Year  Sales Group  Mean   Sum
   <fct> <dbl> <int> <dbl> <dbl>
 1 2000   2000     1  5100 25500
 2 2001   4800     1  5100 25500
 3 2002   6700     1  5100 25500
...
14 2013   4200     3  4600 23000
15 2014   1200     3  4600 23000
16 2015   5700     4  5700  5700

Upvotes: 6

Related Questions