tora0515
tora0515

Reputation: 2547

dplyr: average over a range based on first occurrence in a different column

I would like to use dyplr and mutate to create a new variable that is either 0 or the average the values in column y, conditional on a range from column z.

For column z range, I would like to use the first time z >= 90 for the max value of the range and then the first time z=31 immediately before z >= 90 for the minimum value of the range.

Note: I will be grouping by column x

For example:

x  y   z 
1  100  0 
1  90   0 
1  90  31 
1  90  60 
1  80  31 
1  75  60 
1  60  90
1  60  60
2  60   0
2  60  30

I would to average y over this range:

  x  y   z
  1  80  31
  1  75  60
  1  60  90

so I would end up with the value 71.7 (I don't care about rounding).

x  y   z  ave
1  100  0    0
1  90   0    0
1  90  31    0
1  90  60    0
1  80  31 71.7
1  75  60 71.7
1  60  90 71.7
1  60  60    0 
2  60   0    0
2  60  30    0

Upvotes: 1

Views: 174

Answers (1)

Julius Vainora
Julius Vainora

Reputation: 48211

We may do

df %>% group_by(x) %>% mutate(ave = {
  if(any(z >= 90)) {
    idxU <- which.max(z >= 90)
    idxL <- max(which(z[1:idxU] == 31))
    replace(z * 0, idxL:idxU, mean(z[idxL:idxU]))
  } else {
    0
  }
})
#    x   y  z      ave
# 1  1 100  0  0.00000
# 2  1  90  0  0.00000
# 3  1  90 31  0.00000
# 4  1  90 60  0.00000
# 5  1  80 31 60.33333
# 6  1  75 60 60.33333
# 7  1  60 90 60.33333
# 8  1  60 60  0.00000
# 9  2  60  0  0.00000
# 10 2  60 30  0.00000

So, idxU is the upper limit for the range, idxL is the lower limit, then in the last line we replace elements idxL:idxU of the zero vector z * 0 by the required mean.

Upvotes: 1

Related Questions