zazizoma
zazizoma

Reputation: 567

combine sets of factors in a dataframe with dplyr

a question about combining factors with dplry. In the same df below I'd like to combine factors a and c into a new factor q per year, and sum their values. I know that I can group_by(years), but how do I also group_by q=a&c, l, b and y? (In reality, I want to combine three factor levels out of 12 by year.)

year  factor    value   
1977     a      564907 
1977     c      349651
1977     l     2852949  
1978     a      504028  
1978     1      413120  
1978     y     2553088 
1979     a      497766 
1979     c      789007 
1979     b     1567934
1980     a      346892

I want:

year  factor    value   
1977     q      564907 + 349651
1977     l     2852949  
1978     q      504028  
1978     1      413120  
1978     y     2553088 
1979     q      497766 + 789007 
1979     b     1567934
1980     q      346892

Thanks in advance.

Upvotes: 1

Views: 1008

Answers (2)

akrun
akrun

Reputation: 886938

This could be done with recode from car. The group by operation can be done with data.table. We convert the 'data.frame' to 'data.table' (setDT(df1)), recode the 'factor' variable to convert the levels 'a' and 'c' to 'q', use that as grouping variable along with 'year', and get the sum of 'value'.

library(car)
library(data.table)
setDT(df1)[, list(value=sum(value)) ,
         .(factor=recode(factor, "c('a', 'c')='q'"), year)]
#  factor year   value
#1:      q 1977  914558
#2:      l 1977 2852949
#3:      q 1978  504028
#4:      1 1978  413120
#5:      y 1978 2553088
#6:      q 1979 1286773
#7:      b 1979 1567934
#8:      q 1980  346892

Upvotes: 0

SabDeM
SabDeM

Reputation: 7190

Here is a solution. Not elegant but it works well I guess.

library(dplyr)

df %>% 
       mutate(index = ifelse(factor %in% c("a", "c"), "q", as.character(levels((factor))))) %>%
       group_by(year, index) %>%
       summarise(sum(value))

Source: local data frame [8 x 3]
Groups: year [?]

   year index sum(value)
  (int) (chr)      (int)
1  1977     b    2852949
2  1977     q     914558
3  1978     l     413120
4  1978     q     504028
5  1978     y    2553088
6  1979     b    1567934
7  1979     q    1286773
8  1980     q     346892

Upvotes: 3

Related Questions