Reputation: 13
I'm very new to R and running out of ideas how to solve the following problem :(
My dataset 'test' looks like this
A B C y z
a1 b1 c1 0.10 0
a1 b1 c2 0.01 1
a1 b2 c1 0.20 1
a1 b2 c2 0.10 0
a2 b1 c1 0.10 0
a2 b1 c2 0.01 1
a2 b2 c1 0.20 0
a2 b2 c2 0.30 1
I want to aggregate my dataset by the some of the 'y' values of the two dimensions 'A' and 'B' which can be done by
> aggregate(x = test$y, by = list(test$A, test$B), FUN=sum)
and returns the correct result:
Group.1 Group.2 x
a1 b1 1
a2 b1 1
a1 b2 1
a2 b2 1
So far, so good. In this simple case I can explicitly write the column names, but what if I want to parameterize them? Somehow like
> fields = 'test$A, test$B'
> aggregate(x = test$aL, by = list(.(fields)), FUN=sum)
it throws an error that the arguments must have the same length. So how can I parameterize the aggregate list? I would be very grateful for any tips.
Upvotes: 1
Views: 378
Reputation: 887851
In addition to aggregate
based options in the comments, the syntax for some efficient methods such as data.table
or dplyr
are below.
We convert the 'data.frame' to 'data.table' (setDT(test)
), grouped by 'fields', get the sum
of 'y'
library(data.table)
fields <- c("A", "B")
setDT(test)[, .(y = sum(y)), by = fields]
# A B y
#1: a1 b1 0.11
#2: a1 b2 0.30
#3: a2 b1 0.11
#4: a2 b2 0.50
Or using dplyr
, we can pass the objects in group_by_
with .dots
argument and get the sum
of 'y'.
library(dplyr)
test %>%
group_by_(.dots = fields) %>%
summarise(y = sum(y))
# A B y
# <chr> <chr> <dbl>
#1 a1 b1 0.11
#2 a1 b2 0.30
#3 a2 b1 0.11
#4 a2 b2 0.50
Upvotes: 1
Reputation: 51592
One way is to make fields
a list with your variables i.e.
fields <- list(test$A, test$B)
aggregate(test$y, by = fields, FUN=sum)
or create a function,
fun1 <- function(v1, v2){aggregate(test$y, by = list(v1, v2), FUN = sum)}
fun1(test$A, test$B)
Upvotes: 1