Reputation: 345
I'm looking to calculate the simple mean of an outcome variable, but only for the outcome associated with the maximal instance of another running variable, grouped by factors.
Of course, the calculated statistic could be substituted for any other function, and the evaluation within the group could be any other function.
library(data.table) #1.9.5
dt <- data.table(name = rep(LETTERS[1:7], each = 3),
target = rep(c(0,1,2), 7),
filter = 1:21)
dt
## name target filter
## 1: A 0 1
## 2: A 1 2
## 3: A 2 3
## 4: B 0 4
## 5: B 1 5
## 6: B 2 6
## 7: C 0 7
With this frame, the desired output should return a mean value for target that meets the criteria of exactly 2.
Something like:
dt[ , .(mFilter = which.max(filter),
target = target), by = name][ ,
mean(target), by = c("name", "mFilter")]
... seems close, but isn't hitting it quite right.
The solution should return:
## name V1
## 1: A 2
## 2: B 2
## 3: ...
Upvotes: 4
Views: 946
Reputation: 78590
You could do this with:
dt[, .(meantarget = mean(target[filter == max(filter)])), by = name]
# name meantarget
# 1: A 2
# 2: B 2
# 3: C 2
# 4: D 2
# 5: E 2
# 6: F 2
# 7: G 2
Upvotes: 4