Reputation: 7879
I am trying to add a summary column to a dataframe. Although the summary statistic should be applied to every column, the statistic itself should only be calculated based on conditional rows.
As an example, given this dataframe:
x <- data.frame(usernum=rep(c(1,2,3,4),each=3),
final=rep(c(TRUE,TRUE,FALSE,FALSE)),
time=1:12)
I would like to add a usernum.mean
column, but where the mean is only calculated when final=TRUE
. I have tried:
library(tidyverse)
x %>%
group_by(usernum) %>%
mutate(user.mean = mean(x$time[x$final==TRUE]))
but this gives an overall mean, rather than by user. I have also tried:
x %>%
group_by(usernum) %>%
filter(final==TRUE) %>%
mutate(user.mean = mean(time))
but this only returns the filtered dataframe:
# A tibble: 6 x 4
# Groups: usernum [4]
usernum final time user.mean
<dbl> <lgl> <int> <dbl>
1 1 TRUE 1 1.5
2 1 TRUE 2 1.5
3 2 TRUE 5 5.5
4 2 TRUE 6 5.5
5 3 TRUE 9 9
6 4 TRUE 10 10
How can I apply those means to every original row?
Upvotes: 1
Views: 1194
Reputation: 886948
If we use x$
after the group_by
, it returns the entire column instead of only the values in that particular group. Second, TRUE/FALSE
is logical vector, so we don't need ==
library(dplyr)
x %>%
group_by(usernum) %>%
mutate(user.mean = mean(time[final]))
The one option where we can use $
is with .data
x %>%
group_by(usernum) %>%
mutate(user.mean = mean(.data$time[.data$final]))
Upvotes: 3