Adam_G
Adam_G

Reputation: 7879

Mutate by group based on a conditional

I am trying to add a summary column to a dataframe. Although the summary statistic should be applied to every column, the statistic itself should only be calculated based on conditional rows.

As an example, given this dataframe:

x <- data.frame(usernum=rep(c(1,2,3,4),each=3),
                final=rep(c(TRUE,TRUE,FALSE,FALSE)),
                time=1:12)

I would like to add a usernum.mean column, but where the mean is only calculated when final=TRUE. I have tried:

library(tidyverse)

x %>% 
  group_by(usernum) %>%
  mutate(user.mean = mean(x$time[x$final==TRUE]))

but this gives an overall mean, rather than by user. I have also tried:

x %>% 
  group_by(usernum) %>%
  filter(final==TRUE) %>% 
  mutate(user.mean = mean(time))

but this only returns the filtered dataframe:

# A tibble: 6 x 4
# Groups:   usernum [4]
  usernum final  time user.mean
    <dbl> <lgl> <int>     <dbl>
1       1 TRUE      1       1.5
2       1 TRUE      2       1.5
3       2 TRUE      5       5.5
4       2 TRUE      6       5.5
5       3 TRUE      9       9  
6       4 TRUE     10      10 

How can I apply those means to every original row?

Upvotes: 1

Views: 1194

Answers (1)

akrun
akrun

Reputation: 886948

If we use x$ after the group_by, it returns the entire column instead of only the values in that particular group. Second, TRUE/FALSE is logical vector, so we don't need ==

library(dplyr)
x %>%
     group_by(usernum) %>% 
     mutate(user.mean = mean(time[final]))

The one option where we can use $ is with .data

x %>% 
    group_by(usernum) %>%
    mutate(user.mean = mean(.data$time[.data$final]))

Upvotes: 3

Related Questions