Jorge Paredes
Jorge Paredes

Reputation: 1078

How to aggregate different columns with different functions

I have a dataset that looks like this

df

ID size product   x    y 
A   1     abc    0.3   5
B   1     abc    0.8   7
C   1     abc    0.5   2
D   3     def    0.6   1

And I want to aggregate it x with mean and y with sum

so the code for aggregate both by sum is like this

df1<-aggregate(list(x=df$x, y=df$y), by=list(df$size), FUN="sum")

How can I change that code to have a dataset like this one:

df2

size     x     y 
 1      0.53   14
 3      0.6    1

Thanks in advance

Upvotes: 1

Views: 154

Answers (3)

&#216;yvind Langsrud
&#216;yvind Langsrud

Reputation: 181

The function aggregate_multiple_fun in the SSBtools package is a wrapper to aggregate that allows multiple functions and functions of several variables.

In this case a possibility is

library(SSBtools)
aggregate_multiple_fun(df, by = df["size"], vars = c(mean = "x", sum = "y"))

Then, the output variable names will be x_mean and y_sum. For the names to be x and y:

aggregate_multiple_fun(df, by = df["size"], 
                   vars = list(x = list(mean = "x"), 
                               y = list(sum = "y")))

To make use of the parameter na.rm to mean and sum, extra parameters must be set to TRUE:

aggregate_multiple_fun(df, by = df["size"], 
                   vars = c(mean = "x", sum = "y"), 
                   forward_dots = TRUE, 
                   dots2dots = TRUE, 
                   na.rm = TRUE)

Forwarding extra parameters can be a pitfall and therefore this is not done by default. The forward_dots parameter is required, and in the case of mean, dots2dots is also required.

Upvotes: 1

akrun
akrun

Reputation: 886938

One option is collap from collapse

library(collapse)
collap(slt(df, -ID, -product), ~ size, custom = list(fmean = 'x', fsum = 'y'))
#  size         x  y
#1    1 0.5333333 14
#2    3 0.6000000  1

NOTE: we could have a range of columns in the list i.e.

collap(slt(df, -ID, -product), ~ size, custom = list(fmean = 2:3))
#  size         x        y
#1    1 0.5333333 4.666667
#2    3 0.6000000 1.000000

Or using the mtcars data

collap(mtcars, ~ cyl, custom = list(fmean = c(1, 3:5),
          fsum = c(6, 7), fmedian = 8:9))
#       mpg cyl     disp        hp     drat     wt   qsec vs am
#1 26.66364   4 105.1364  82.63636 4.070909 25.143 210.51  1  1
#2 19.74286   6 183.3143 122.28571 3.585714 21.820 125.84  1  0
#3 15.10000   8 353.1000 209.21429 3.229286 55.989 234.81  0  0

The return can be a list or long_dupl or long or wide

collap(mtcars, ~ cyl, custom = list(fmean = c(1, 3:5), 
       fsum = c(6, 7), fmedian = 8:9), return = "list")

If we want to use multiple functions for different columns, an option is to loop with Map and Reduce by mergeing the list elements to a single summarised output

Reduce(function(...) merge(..., by = 'size'), 
  Map(function(u, v) aggregate(reformulate('size', response = v), df,
      FUN = match.fun(u)), c('mean', 'sum'), c('x', 'y')))
#  size         x  y
#1    1 0.5333333 14
#2    3 0.6000000  1

data

df <- structure(list(ID = c("A", "B", "C", "D"), size = c(1L, 1L, 1L, 
3L), product = c("abc", "abc", "abc", "def"), x = c(0.3, 0.8, 
0.5, 0.6), y = c(5L, 7L, 2L, 1L)), class = "data.frame", row.names = c(NA, 
-4L))

Upvotes: 4

Joel Kandiah
Joel Kandiah

Reputation: 1525

I would use the summarise() function from the tidyverse set of packages when dealing with data frames of this nature. This allows you to summarise over a group with multiple different equations. I have demonstrated what I'd expect the solution to look like below with this solution.


df %>%
  group_by(size) %>%
  summarise(
    x = mean(x),
    y = sum (y)
  )

A base r solution would require a different approach.

Upvotes: 3

Related Questions