Reputation: 2547
I would like to use dyplr and mutate to create a new variable that is either 0 or the average the values in column y, conditional on a range from column z.
For column z range, I would like to use the first time z >= 90 for the max value of the range and then the first time z=31 immediately before z >= 90 for the minimum value of the range.
Note: I will be grouping by column x
For example:
x y z
1 100 0
1 90 0
1 90 31
1 90 60
1 80 31
1 75 60
1 60 90
1 60 60
2 60 0
2 60 30
I would to average y over this range:
x y z
1 80 31
1 75 60
1 60 90
so I would end up with the value 71.7 (I don't care about rounding).
x y z ave
1 100 0 0
1 90 0 0
1 90 31 0
1 90 60 0
1 80 31 71.7
1 75 60 71.7
1 60 90 71.7
1 60 60 0
2 60 0 0
2 60 30 0
Upvotes: 1
Views: 174
Reputation: 48211
We may do
df %>% group_by(x) %>% mutate(ave = {
if(any(z >= 90)) {
idxU <- which.max(z >= 90)
idxL <- max(which(z[1:idxU] == 31))
replace(z * 0, idxL:idxU, mean(z[idxL:idxU]))
} else {
0
}
})
# x y z ave
# 1 1 100 0 0.00000
# 2 1 90 0 0.00000
# 3 1 90 31 0.00000
# 4 1 90 60 0.00000
# 5 1 80 31 60.33333
# 6 1 75 60 60.33333
# 7 1 60 90 60.33333
# 8 1 60 60 0.00000
# 9 2 60 0 0.00000
# 10 2 60 30 0.00000
So, idxU
is the upper limit for the range, idxL
is the lower limit, then in the last line we replace elements idxL:idxU
of the zero vector z * 0
by the required mean.
Upvotes: 1