Reputation: 77
I am trying to perform what should be a simple exercise of aggregating columns in a dataframe by the first column. Each column is aggregated with a summary statistic, but they are not the same. For example several columns I need the mean, several I need standard deviation, several I need the max value, etc. I am using documentation with the aggregate function within R and for some reason can't get it.
Can you help?
Attached is a simple example data set. For illustration purposes I need to aggregate by the "name" column. Var1 I need the mean, Var2 I need the standard deviation, Var3 I need the max value, Var4 I would like the sum
name v1 v2 v3 v4 v5
a 78 74 43 76 61
b 50 84 76 73 35
c 4 55 4 86 32
d 23 29 81 73 55
e 36 68 27 70 96
# A tibble: 11 x 5
name v1 v2 v3 v4
<fct> <dbl> <dbl> <int> <int>
1 a 78 NaN 43 76
2 b 50 NaN 76 73
3 c 4 NaN 4 86
4 d 23 NaN 81 73
5 e 36 NaN 27 70
6 f 13 NaN 34 77
7 g 2 NaN 57 79
8 h 72 NaN 29 3
9 i 17 NaN 56 14
10 j 87 NaN 36 13
11 k 9 NaN 74 41
Upvotes: 0
Views: 514
Reputation: 887231
We could use dplyr
for flexibility
library(dplyr)
df1 %>%
group_by(name) %>%
summarise(v1 = mean(v1, na.rm = TRUE),
v2 = sd(v2, na.rm = TRUE), v3 = max(v3, na.rm = TRUE),
v4 = sum(v4, na.rm = TRUE))
If there are multiple columns to be blocked for different functions, use across
df1 %>%
group_by(name) %>%
summarise(across(c(v1, v2), mean, na.rm = TRUE),
v3 = sd(v3, na.rm = TRUE),
across(c(v4, v5), sum, na.rm = TRUE))
Or use collap
from collapse
library(collapse)
collap(df1, ~ name, custom = list(fmean = c("v1", "v2"),
fsd = "v3", fsum = c("v4", "v5")))
Upvotes: 3