Reputation: 281
Here is some example data:
library(car)
library(dplyr)
df1 <- mtcars %>%
group_by(cyl, gear) %>%
summarise(
newvar = sum(wt)
)
# A tibble: 8 x 3
# Groups: cyl [?]
cyl gear newvar
<dbl> <dbl> <dbl>
1 4 3 2.46
2 4 4 19.0
3 4 5 3.65
4 6 3 6.68
5 6 4 12.4
6 6 5 2.77
7 8 3 49.2
8 8 5 6.74
What if I then wanted to apply a custom function calculating the difference between the newvar values for cars with 3 or 5 gears for each level of cylinder?
df2 <- df1 %>% mutate(Diff = newvar[gear == "3"] - newvar[gear == "5"])
or with summarise?
df2 <- df1 %>% summarise(Diff = newvar[gear == "3"] - newvar[gear == "5"])
There must be a way to apply functions for different levels within different factors?
Any help appreciated!
Upvotes: 1
Views: 170
Reputation: 887821
An option is to spread
into 'wide' format and then do the -
library(tidyverse)
df1 %>%
filter(gear %in% c(3, 5) ) %>%
spread(gear, newvar) %>%
transmute(newvar = `3` - `5`)
# A tibble: 3 x 2
# Groups: cyl [3]
# cyl newvar
# <dbl> <dbl>
#1 4 -1.19
#2 6 3.90
#3 8 42.5
Upvotes: 2
Reputation: 60180
Your example code is most of the way there. You can do:
df1 %>%
mutate(Diff = newvar[gear == "3"] - newvar[gear == "5"])
Or:
df1 %>%
summarise(Diff = newvar[gear == "3"] - newvar[gear == "5"])
Logical subsetting still works in mutate()
and summarise()
calls like with any other vector.
Note that this works because after your summarise()
call in your example code, df1
is still grouped by cyl
, otherwise you would need to do a group_by()
call to create the correct grouping.
Upvotes: 4