Reputation: 567
a question about combining factors with dplry. In the same df below I'd like to combine factors a and c into a new factor q per year, and sum their values. I know that I can group_by(years), but how do I also group_by q=a&c, l, b and y? (In reality, I want to combine three factor levels out of 12 by year.)
year factor value
1977 a 564907
1977 c 349651
1977 l 2852949
1978 a 504028
1978 1 413120
1978 y 2553088
1979 a 497766
1979 c 789007
1979 b 1567934
1980 a 346892
I want:
year factor value
1977 q 564907 + 349651
1977 l 2852949
1978 q 504028
1978 1 413120
1978 y 2553088
1979 q 497766 + 789007
1979 b 1567934
1980 q 346892
Thanks in advance.
Upvotes: 1
Views: 1008
Reputation: 886938
This could be done with recode
from car
. The group by operation can be done with data.table
. We convert the 'data.frame' to 'data.table' (setDT(df1)
), recode
the 'factor' variable to convert the levels
'a' and 'c' to 'q', use that as grouping variable along with 'year', and get the sum
of 'value'.
library(car)
library(data.table)
setDT(df1)[, list(value=sum(value)) ,
.(factor=recode(factor, "c('a', 'c')='q'"), year)]
# factor year value
#1: q 1977 914558
#2: l 1977 2852949
#3: q 1978 504028
#4: 1 1978 413120
#5: y 1978 2553088
#6: q 1979 1286773
#7: b 1979 1567934
#8: q 1980 346892
Upvotes: 0
Reputation: 7190
Here is a solution. Not elegant but it works well I guess.
library(dplyr)
df %>%
mutate(index = ifelse(factor %in% c("a", "c"), "q", as.character(levels((factor))))) %>%
group_by(year, index) %>%
summarise(sum(value))
Source: local data frame [8 x 3]
Groups: year [?]
year index sum(value)
(int) (chr) (int)
1 1977 b 2852949
2 1977 q 914558
3 1978 l 413120
4 1978 q 504028
5 1978 y 2553088
6 1979 b 1567934
7 1979 q 1286773
8 1980 q 346892
Upvotes: 3