Reputation: 443
I have a dataframe where I want : By groups in two columns, I want to compute mean and know the number of element that have this value :
test=data.frame(a=c(1,1,1,2,4,5,8,5,5,7),
b=c('A','A','B','A','B','B','A','B','B','A'),
c=runif(10, -5,0))
I want for rows that have same 'a' and 'b' compute the mean of 'c' and know the sum of 'a' used to compute the mean
I tried this to compute mean :
test_mean=test %>%
group_by(a,b) %>%
summarise_at(vars("c"), mean)
# %>% mutate(d = sum(a))
a b c
1 A -3.138246
1 B -0.411621
2 A -2.787820
4 B -2.343191
5 B -3.323057
7 A -4.765974
8 A -4.596118
But I want also a fourth column with the sum :
a b c d
1 A -3.138246 2
1 B -0.411621 1
2 A -2.787820 1
4 B -2.343191 1
5 B -3.323057 3
7 A -4.765974 1
8 A -4.596118 1
Upvotes: 1
Views: 127
Reputation: 887971
Using data.table
library(data.table)
setDT(test)[, .(c = mean(c), d = .N), .(a, b)]
Upvotes: 1
Reputation: 2551
No need for summarize_at
, plain summarize
will do in this case:
library(tidyverse)
test <- data.frame(a=c(1,1,1,2,4,5,8,5,5,7),
b=c('A','A','B','A','B','B','A','B','B','A'),
c=runif(10, -5,0))
test %>%
group_by(a,b) %>%
summarize(c = mean(c), d = n())
#> `summarise()` regrouping output by 'a' (override with `.groups` argument)
#> # A tibble: 7 x 4
#> # Groups: a [6]
#> a b c d
#> <dbl> <chr> <dbl> <int>
#> 1 1 A -2.83 2
#> 2 1 B -0.992 1
#> 3 2 A -2.92 1
#> 4 4 B -4.83 1
#> 5 5 B -3.19 3
#> 6 7 A -0.639 1
#> 7 8 A -2.25 1
If you have multiple variables, then consider using across
(dplyr >= 1.0.0)
Upvotes: 1