Reputation: 3973
I want to aggregate values of a data.table in R via multiple functions by a grouping variable AND keep the information that is in other columns (not included in the aggregation) in the corresponding rows (=the same row as the aggregation). An example:
Note: The code uses this which_quantile()-function (using sort(x) instead of order(x) in its code). It finds an actual value of a data set which is close to the defined quantile.
# sample data
dt <- structure(list(State = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
.Label = c("AK", "RI"), class = "factor"), Company = structure(1:8, .Label = c("A",
"B", "C", "D", "E", "F", "G", "H"), class = "factor"), Employees = c(82L,
104L, 37L, 24L, 19L, 118L, 88L, 42L), Number=c(1L,2L,3L,4L,5L,6L,7L,8L), Number2=c(9,10,11,12,13,14,15,16)),
.Names = c("State", "Company", "Employees", "Number", "Number2"), class = "data.frame", row.names = c(NA, 8L))
require(data.table)
setDT(dt)
# aggregation
agg <- dt[ , .(max = max(Employees),
min = min(Employees),
quantile70 = which.quantile(Employees, 0.7)), by=State]
agg_m <- dt[agg, on="State"]
aggregating a DT leads to the following output:
State max min quantile70
1: AK 104 24 82
2: RI 118 19 88
merging the aggregation with the original DT to that:
State Company Employees Number Number2 max min quantile70
1: AK A 82 1 9 104 24 82
2: AK B 104 2 10 104 24 82
3: AK C 37 3 11 104 24 82
4: AK D 24 4 12 104 24 82
5: RI E 19 5 13 118 19 88
6: RI F 118 6 14 118 19 88
7: RI G 88 7 15 118 19 88
8: RI H 42 8 16 118 19 88
Question: How can I aggregate the data.table and at the same time keep the corresponding values in the columns Company, Number and Number2? The max. of the Number column in the State AK is 104 and the corresponding value in column Number2 is 10. The min is 24 and the corresponding value 12 and so on. How to keep this information when aggregating a data.table?
desired output:
State Company Employees Number Number2 aggregation
1: AK A 82 1 9 quantile70
2: AK B 104 2 10 max
3: AK D 24 4 12 min
4: RI E 19 5 13 min
5: RI F 118 6 14 max
6: RI H 88 8 16 quantile70
The question is similar to this one. The sample data was also taken from there and adapted.
The following aggregations don't solve my problem:
dt[ ,.SD[ which.max(Employees) ], by=State]
dt[dt[ ,.I[ which.max(Employees) ], by=State ]$V1]
# only which.max() OR which.min() are possible
dt[ , max_Empl := max(Employees), by=State ]
# only ONE aggregation function at a time is possible
Upvotes: 2
Views: 4121
Reputation: 66819
Following @eddi's canonical answer on subsetting by group...
aggi <- dt[ , .(max = .I[which.max(Employees)],
min = .I[which.min(Employees)],
quantile70 = .I[which.quantile(Employees, 0.7)]), by=State]
From here, you can do
maggi <- melt(aggi, id="State")
dt[maggi$value][, v := maggi$variable][]
State Company Employees Number Number2 v
1: AK B 104 2 10 max
2: RI F 118 6 14 max
3: AK D 24 4 12 min
4: RI E 19 5 13 min
5: AK A 82 1 9 quantile70
6: RI G 88 7 15 quantile70
Upvotes: 1