Nono_sad
Nono_sad

Reputation: 443

R compute mean and sum of value in dataframe using group_by

I have a dataframe where I want : By groups in two columns, I want to compute mean and know the number of element that have this value :

test=data.frame(a=c(1,1,1,2,4,5,8,5,5,7),
                b=c('A','A','B','A','B','B','A','B','B','A'),
                c=runif(10, -5,0))

I want for rows that have same 'a' and 'b' compute the mean of 'c' and know the sum of 'a' used to compute the mean

I tried this to compute mean :

test_mean=test %>%
  group_by(a,b) %>% 
  summarise_at(vars("c"), mean) 
# %>% mutate(d = sum(a))


a   b      c      
1   A   -3.138246
1   B   -0.411621
2   A   -2.787820
4   B   -2.343191
5   B   -3.323057
7   A   -4.765974
8   A   -4.596118

But I want also a fourth column with the sum :

a   b      c          d    
1   A   -3.138246     2
1   B   -0.411621     1
2   A   -2.787820     1
4   B   -2.343191     1
5   B   -3.323057     3
7   A   -4.765974     1
8   A   -4.596118     1

Upvotes: 1

Views: 127

Answers (2)

akrun
akrun

Reputation: 887971

Using data.table

library(data.table)
setDT(test)[, .(c = mean(c), d = .N), .(a, b)]

Upvotes: 1

csgroen
csgroen

Reputation: 2551

No need for summarize_at, plain summarize will do in this case:

library(tidyverse)
test <- data.frame(a=c(1,1,1,2,4,5,8,5,5,7),
                b=c('A','A','B','A','B','B','A','B','B','A'),
                c=runif(10, -5,0))
test %>% 
    group_by(a,b) %>% 
    summarize(c = mean(c), d = n())
#> `summarise()` regrouping output by 'a' (override with `.groups` argument)
#> # A tibble: 7 x 4
#> # Groups:   a [6]
#>       a b          c     d
#>   <dbl> <chr>  <dbl> <int>
#> 1     1 A     -2.83      2
#> 2     1 B     -0.992     1
#> 3     2 A     -2.92      1
#> 4     4 B     -4.83      1
#> 5     5 B     -3.19      3
#> 6     7 A     -0.639     1
#> 7     8 A     -2.25      1

If you have multiple variables, then consider using across (dplyr >= 1.0.0)

Upvotes: 1

Related Questions