Custom function with dplyr mutate or summarise for different levels within a factor?

Question

Here is some example data:

library(car)
library(dplyr)
    df1 <- mtcars %>%
                group_by(cyl, gear) %>%
                summarise(
                    newvar = sum(wt)
                )
# A tibble: 8 x 3
# Groups:   cyl [?]
    cyl  gear newvar
     
1     4     3   2.46
2     4     4  19.0 
3     4     5   3.65
4     6     3   6.68
5     6     4  12.4 
6     6     5   2.77
7     8     3  49.2 
8     8     5   6.74

What if I then wanted to apply a custom function calculating the difference between the newvar values for cars with 3 or 5 gears for each level of cylinder?

df2 <- df1 %>%  mutate(Diff = newvar[gear == "3"] - newvar[gear == "5"])

or with summarise?

df2 <- df1 %>%  summarise(Diff = newvar[gear == "3"] - newvar[gear == "5"])

There must be a way to apply functions for different levels within different factors?

Any help appreciated!

Marius · Accepted Answer

Your example code is most of the way there. You can do:

df1 %>% 
    mutate(Diff = newvar[gear == "3"] - newvar[gear == "5"])

Or:

df1 %>% 
    summarise(Diff = newvar[gear == "3"] - newvar[gear == "5"])

Logical subsetting still works in mutate() and summarise() calls like with any other vector.

Note that this works because after your summarise() call in your example code, df1 is still grouped by cyl, otherwise you would need to do a group_by() call to create the correct grouping.

Custom function with dplyr mutate or summarise for different levels within a factor?

Answers (2)

Related Questions