LiveLongandProsper
LiveLongandProsper

Reputation: 281

Custom function with dplyr mutate or summarise for different levels within a factor?

Here is some example data:

library(car)
library(dplyr)
    df1 <- mtcars %>%
                group_by(cyl, gear) %>%
                summarise(
                    newvar = sum(wt)
                )
# A tibble: 8 x 3
# Groups:   cyl [?]
    cyl  gear newvar
  <dbl> <dbl>  <dbl>
1     4     3   2.46
2     4     4  19.0 
3     4     5   3.65
4     6     3   6.68
5     6     4  12.4 
6     6     5   2.77
7     8     3  49.2 
8     8     5   6.74

What if I then wanted to apply a custom function calculating the difference between the newvar values for cars with 3 or 5 gears for each level of cylinder?

df2 <- df1 %>%  mutate(Diff = newvar[gear == "3"] - newvar[gear == "5"]) 

or with summarise?

df2 <- df1 %>%  summarise(Diff = newvar[gear == "3"] - newvar[gear == "5"])

There must be a way to apply functions for different levels within different factors?

Any help appreciated!

Upvotes: 1

Views: 170

Answers (2)

akrun
akrun

Reputation: 887821

An option is to spread into 'wide' format and then do the -

library(tidyverse)
df1 %>%
   filter(gear %in% c(3, 5) ) %>% 
   spread(gear, newvar) %>% 
   transmute(newvar = `3` - `5`)
# A tibble: 3 x 2
# Groups:   cyl [3]
#    cyl newvar
#  <dbl>  <dbl>
#1     4  -1.19
#2     6   3.90
#3     8  42.5 

Upvotes: 2

Marius
Marius

Reputation: 60180

Your example code is most of the way there. You can do:

df1 %>% 
    mutate(Diff = newvar[gear == "3"] - newvar[gear == "5"])

Or:

df1 %>% 
    summarise(Diff = newvar[gear == "3"] - newvar[gear == "5"])

Logical subsetting still works in mutate() and summarise() calls like with any other vector.

Note that this works because after your summarise() call in your example code, df1 is still grouped by cyl, otherwise you would need to do a group_by() call to create the correct grouping.

Upvotes: 4

Related Questions