Samet Sökel
Samet Sökel

Reputation: 2660

Summarising a column which is not specified in summarise function by dplyr

I am trying to reduce my code size by following this way ;

library(dplyr)

set.seed(1453)

summarise_funs <- c('mean','median','sum')

iris %>% 
mutate(y=rnorm(nrow(.),mean=2,sd=3)) %>% 
group_by(Species) %>% 
summarise(stat = get(summarise_funs[3])(Sepal.Width))

it works fine, but I tried to summarise the column y by having its mean by groups which I created, while Sepal.Width is specified.

In practice my code shoud look so;

library(dplyr)

set.seed(1453)

summarise_funs <- c('mean','median','sum','y_mean')

iris %>% 
mutate(y=rnorm(nrow(.),mean=2,sd=3)) %>% 
group_by(Species) %>% 
summarise(stat = get(summarise_funs[4])(Sepal.Width)) 

and the output should be (means of y);

  Species     stat
  <fct>      <dbl>
1 setosa      1.81
2 versicolor  1.85
3 virginica   2.34

is it possible to create y_mean function without specifying y in summarise and when Sepal.Width is already specified?

If it is, what the function should be like?

Thanks in advance.

Upvotes: 0

Views: 46

Answers (2)

user10917479
user10917479

Reputation:

Maybe using a named list will help you get what you want, but I am still a bit unclear on the exact expected behavior.

library(dplyr)

set.seed(1453)

summarise_funs <- list('mean','median','sum', y = 'mean')

iris %>% 
  mutate(y=rnorm(nrow(.),mean=2,sd=3)) %>% 
  group_by(Species) %>% 
  summarise(across(all_of(names(summarise_funs[4])), summarise_funs[4],
                   .names = paste0("{.col}_", summarise_funs[4])))

# # A tibble: 3 x 2
#   Species    y_mean
#   <fct>       <dbl>
# 1 setosa       1.81
# 2 versicolor   1.85
# 3 virginica    2.34

Upvotes: 1

Limey
Limey

Reputation: 12451

Perhaps this gives you something like what you want, but I am still utterly unclear about what you want to do and why you expect calling an undefined function should work...

library(dplyr)
set.seed(1453)
summarise_funs <- list('mean','median','sum')

iris %>% 
  mutate(y=rnorm(nrow(.),mean=2,sd=3)) %>% 
  group_by(Species) %>% 
  summarise(across(y, summarise_funs)) 
# A tibble: 3 × 4
  Species      y_1   y_2   y_3
  <fct>      <dbl> <dbl> <dbl>
1 setosa      2.47  2.30 124. 
2 versicolor  2.52  2.33 126. 
3 virginica   1.77  1.90  88.5

whereas

> iris %>% 
+   mutate(y=rnorm(nrow(.),mean=2,sd=3)) %>% 
+   group_by(Species) %>% 
+   summarise(across(c(Sepal.Width, y), summarise_funs)) 
# A tibble: 3 × 7
  Species    Sepal.Width_1 Sepal.Width_2 Sepal.Width_3   y_1   y_2   y_3
  <fct>              <dbl>         <dbl>         <dbl> <dbl> <dbl> <dbl>
1 setosa              3.43           3.4          171.  1.76  2.09  88.0
2 versicolor          2.77           2.8          138.  2.04  1.95 102. 
3 virginica           2.97           3            149.  2.43  2.12 121. 

The .names argument to across (and the ability to use a named list of functions) gives more control over the column names in the output object.

Upvotes: 1

Related Questions