Reputation: 1862
Task: For all condition==FALSE
, set groupmean to mean of all numbers
by group
.
For all condition==TRUE
set groupmean to mean of numbers
only where condition==TRUE
by group
.
I would like to have a solution which does not require copying the whole data.table but adds the desired column in place. I bet there's a plain simple solution, but I got lost a little...
My attempts so far:
set.seed(42)
require(data.table)
DT <- data.table(condition=sample(c(TRUE,FALSE), 50, replace=T),
group=rep(LETTERS[1:4], times=25),
numbers=1:100)
# modifies the right rows, but wrong value
DT[condition==FALSE, groupmean_1 := mean(numbers), by=group]
# right values, but not only rows where condition=FALSE
DT[, groupmean_2 := mean(numbers), by=group]
head(DT)
condition group numbers groupmean_1 groupmean_2
1: FALSE A 1 42.66667 49
2: FALSE B 2 55.68421 50
3: TRUE C 3 NA 51
4: FALSE D 4 47.78947 52
5: FALSE A 5 42.66667 49
6: FALSE B 6 55.68421 50
Upvotes: 2
Views: 1929
Reputation: 3224
You should reverse the sequence of how you define groupmean
. Compute it as the group average for all rows, and substitute the rows where condition == TRUE
afterwards.
DT[, groupmean:=mean(numbers), by=group]
DT[condition==TRUE, groupmean:=mean(numbers), by='group,condition']
I hope that helps
Upvotes: 2