Reputation: 1458
Apologies for the convoluted title. Essentially, I am aggregating some data in R.
df <- data.frame(value1=c(1,2,3,4,5,6,7,8,9,10),
+ value2=c(2,3,4,5,6,7,8,9,1,2),
+ group=c("a","b","a","b","a","b","a","b","a","c"))
DT <- as.data.table(df)
The resulting data looks like this:
value1 value2 group
1 2 a
2 3 b
3 4 a
4 5 b
5 6 a
6 7 b
7 8 a
8 9 b
9 1 a
10 2 c
I would like to find the value1 for each group of the row with the highest value2, using data.table.
For example, the above table would turn into:
new_val group
7 a
8 b
10 c
...because 7 is the value of value1 for the row with the max value2 in group a, and so on and so forth.
I'm having a very difficult time doing this while creating a new datatable with a number of grouping operations.
The code I am using currently looks like:
DT[,list(
rate_max_acct = max(value1(which.max(value2)))
),
by=group] -> DT2
This isn't working -- what I was trying to do is select the row where value2 is maximum for each froup, and then find the max of value1 for that group, but it doesn't work and throws the following error:
could not find function "value1"
So evidently it thinks that I'm trying to apply a function, not look for an object.
Any help here would be greatly appreciated, especially if I can do it within this list -- I'm doing a bunch of grouping operations at once and would really like it if I could keep the "xxx = yyy" syntax.
Upvotes: 2
Views: 100
Reputation: 1289
Simple extension to previous answer in case you are looking for multiple maxima.
library(data.table) ; setkey(DT, group)
tmp <- DT[, .(value1[which(value2 == max(value2)), max(value2))], by = group]
This will give you a three column data.table
with the first column the group identifier, the second column all the variables value1
for which value2
reaches the maximum, and the third column, the maximum value2
per group
.
Upvotes: 0
Reputation: 31171
You can do:
DT[,list(newval=value1[which.max(value2)]), group]
# group newval
#1: a 7
#2: b 8
#3: c 10
You do not need so much max(value1[which.max(value2)])
since which.max
will give you the index of the row where value2 is maximum, you just need value1[which.max(value2)]
Upvotes: 2