Reputation: 1078
I have a dataset that looks like this
df
ID size product x y
A 1 abc 0.3 5
B 1 abc 0.8 7
C 1 abc 0.5 2
D 3 def 0.6 1
And I want to aggregate it x with mean and y with sum
so the code for aggregate both by sum is like this
df1<-aggregate(list(x=df$x, y=df$y), by=list(df$size), FUN="sum")
How can I change that code to have a dataset like this one:
df2
size x y
1 0.53 14
3 0.6 1
Thanks in advance
Upvotes: 1
Views: 154
Reputation: 181
The function aggregate_multiple_fun
in the SSBtools
package is a wrapper to aggregate
that allows multiple functions and functions of several variables.
In this case a possibility is
library(SSBtools)
aggregate_multiple_fun(df, by = df["size"], vars = c(mean = "x", sum = "y"))
Then, the output variable names will be x_mean
and y_sum
. For the names to be x
and y
:
aggregate_multiple_fun(df, by = df["size"],
vars = list(x = list(mean = "x"),
y = list(sum = "y")))
To make use of the parameter na.rm
to mean
and sum
, extra parameters must be set to TRUE
:
aggregate_multiple_fun(df, by = df["size"],
vars = c(mean = "x", sum = "y"),
forward_dots = TRUE,
dots2dots = TRUE,
na.rm = TRUE)
Forwarding extra parameters can be a pitfall and therefore this is not done by default. The forward_dots
parameter is required, and in the case of mean
, dots2dots
is also required.
Upvotes: 1
Reputation: 886938
One option is collap
from collapse
library(collapse)
collap(slt(df, -ID, -product), ~ size, custom = list(fmean = 'x', fsum = 'y'))
# size x y
#1 1 0.5333333 14
#2 3 0.6000000 1
NOTE: we could have a range of columns in the list
i.e.
collap(slt(df, -ID, -product), ~ size, custom = list(fmean = 2:3))
# size x y
#1 1 0.5333333 4.666667
#2 3 0.6000000 1.000000
Or using the mtcars
data
collap(mtcars, ~ cyl, custom = list(fmean = c(1, 3:5),
fsum = c(6, 7), fmedian = 8:9))
# mpg cyl disp hp drat wt qsec vs am
#1 26.66364 4 105.1364 82.63636 4.070909 25.143 210.51 1 1
#2 19.74286 6 183.3143 122.28571 3.585714 21.820 125.84 1 0
#3 15.10000 8 353.1000 209.21429 3.229286 55.989 234.81 0 0
The return
can be a list
or long_dupl
or long
or wide
collap(mtcars, ~ cyl, custom = list(fmean = c(1, 3:5),
fsum = c(6, 7), fmedian = 8:9), return = "list")
If we want to use multiple functions for different columns, an option is to loop with Map
and Reduce
by merge
ing the list
elements to a single summarised output
Reduce(function(...) merge(..., by = 'size'),
Map(function(u, v) aggregate(reformulate('size', response = v), df,
FUN = match.fun(u)), c('mean', 'sum'), c('x', 'y')))
# size x y
#1 1 0.5333333 14
#2 3 0.6000000 1
df <- structure(list(ID = c("A", "B", "C", "D"), size = c(1L, 1L, 1L,
3L), product = c("abc", "abc", "abc", "def"), x = c(0.3, 0.8,
0.5, 0.6), y = c(5L, 7L, 2L, 1L)), class = "data.frame", row.names = c(NA,
-4L))
Upvotes: 4
Reputation: 1525
I would use the summarise()
function from the tidyverse
set of packages when dealing with data frames of this nature. This allows you to summarise over a group with multiple different equations. I have demonstrated what I'd expect the solution to look like below with this solution.
df %>%
group_by(size) %>%
summarise(
x = mean(x),
y = sum (y)
)
A base r solution would require a different approach.
Upvotes: 3