Reputation: 1932
I have a very large data set with many columns. I need to aggregate this data and perform different functions on each set of columns. I have many columns so it is cumbersome to manually specify each function to be used with each column in the aggregation. I have read many post on how to aggregate with data.table
but none seem to handle this situation.
Example
Lets work with the mpg
dataset. I want to aggregate by cyl
and trans
. I want to return the average for the cty
and hwy
columns but want to return the first value from manufacturer
and model
. In reality I would apply to many more columns by specifying cols<-names(data)[10:50]
and .SDcols=cols
. I can process each set of cols separately and then merge the data afterwards, but even this gets messy when you have dozens of different column groups. I believe there must be a better way with data.table
.
require("data.table")
require("ggplot2") #for the mpg dataset
dat1<-data.table(mpg)
perf<-dat1[, lapply(.SD, ave), .SDcols=c("cty","hwy"), by=list(cyl,trans)] #Aggregate performance data.
info<-dat1[, lapply(.SD, first), .SDcols=c("manufacturer","model"), by=list(cyl,trans)] #Aggregate model data.
perf[info, on=list(cyl,trans)] #Merge data
> head(perf[info, on=list(cyl,trans)])
cyl trans cty hwy manufacturer model
1: 4 auto(l5) 6 6 audi a4
2: 4 manual(m5) 33 33 audi a4
3: 4 manual(m6) 7 7 audi a4
4: 4 auto(av) 2 2 audi a4
5: 6 auto(l5) 16 16 audi a4
6: 6 manual(m5) 18 18 audi a4
Question
How can I perform the above in 1 operation? What I am thinking is to subset each group and then do something like this. Maybe a bit overambitious but I'm just shooting for the stars I guess.
dat1[, .(names(group1)=lapply(group1, ave),names(group2)=lapply(group2, first)), by=list(cyl,trans)]
Upvotes: 2
Views: 201
Reputation: 66819
group1 = c("cty", "hwy")
group2 = c("manufacturer", "model")
dat1[, c(
lapply(.SD[, ..group1], mean),
lapply(.SD[, ..group2], first)
), by=.(cyl, trans)]
gives
cyl trans cty hwy manufacturer model
1: 4 auto(l5) 20.33333 31.00000 audi a4
2: 4 manual(m5) 21.54545 29.27273 audi a4
3: 4 manual(m6) 21.00000 29.57143 audi a4
4: 4 auto(av) 22.00000 30.50000 audi a4
5: 6 auto(l5) 15.18750 21.43750 audi a4
6: 6 manual(m5) 16.66667 22.66667 audi a4
7: 6 auto(av) 18.66667 26.00000 audi a4
8: 4 auto(s6) 20.50000 28.25000 audi a4 quattro
9: 6 auto(s6) 17.40000 26.00000 audi a4 quattro
10: 6 manual(m6) 16.00000 22.60000 audi a4 quattro
11: 8 auto(s6) 13.60000 20.40000 audi a6 quattro
12: 8 auto(l4) 12.20000 16.73333 chevrolet c1500 suburban 2wd
13: 8 manual(m6) 13.42857 20.00000 chevrolet corvette
14: 4 auto(l4) 20.50000 27.62500 chevrolet malibu
15: 6 auto(l4) 16.03448 22.68966 chevrolet malibu
16: 4 auto(l3) 21.00000 27.00000 dodge caravan 2wd
17: 6 auto(l6) 16.00000 23.00000 dodge caravan 2wd
18: 8 auto(l5) 12.29412 16.41176 dodge dakota pickup 4wd
19: 8 manual(m5) 13.00000 18.80000 dodge dakota pickup 4wd
20: 8 auto(l6) 12.50000 18.50000 ford expedition 2wd
21: 8 auto(s5) 12.00000 18.00000 nissan pathfinder 4wd
22: 8 auto(s4) 16.00000 25.00000 pontiac grand prix
23: 4 auto(s4) 20.00000 26.00000 subaru impreza awd
24: 4 auto(s5) 22.00000 31.00000 toyota camry solara
25: 6 auto(s5) 18.00000 27.00000 toyota camry solara
26: 5 auto(s6) 20.50000 29.00000 volkswagen jetta
27: 5 manual(m5) 20.50000 28.50000 volkswagen jetta
cyl trans cty hwy manufacturer model
Upvotes: 4