r aggregate dataframe: some columns unchanged, some columns aggregated

Question

I have tried to look at manuals for aggregate, apply, etc, but I can't find examples where some columns are skipped by the function being applied, while other columns are copied as-is. Example:

> olddf = data.frame(code=c("one","one","two"), val1=c(1,2,3), val2=c(4,5,6), val3=c(7,8,9))
> olddf
  code val1 val2 val3
1  one    1    4    7
2  one    2    5    8
3  two    3    6    9
>

How do I aggregate olddf so that I get a new data frame where:

code column is kept as-is,
val1 is skipped
val2 is aggregated normally e.g. sum()
and a new column is created based on one of the earlier aggregates... e.g. new column = sum(val3)/sum(val2)?

I basically want:

> newdf
  code val2 newcol
1  one    9 1.6666
2  two    6 1.5000

I would like to do this in one step rather than defining a separate function to deal with each column / aggregate. Is this possible?

KFB · Accepted Answer

Try data.table

library(data.table)
setDT(olddf)[, .(val2 = sum(val2),
                 newcol = sum(val3)/sum(val2)), by = code]
#    code val2   newcol
# 1:  one    9 1.666667
# 2:  two    6 1.500000

r aggregate dataframe: some columns unchanged, some columns aggregated

Answers (2)

Related Questions