Reputation: 181
I am trying to get a a max value per stretch of an indicator, or repeating value.
Here is an example:
A = c(28, 20, 23, 30, 26, 23, 25, 26, 27, 25, 30, 26, 25, 22, 24, 25, 24, 27, 29)
B = c(0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1)
df <- as.data.frame(cbind(A, B))
df
A B
28 0
20 1
23 1
30 0
26 0
23 1
25 1
26 1
27 0
25 0
30 1
26 1
25 1
22 0
24 1
25 0
24 0
27 0
29 1
For each group or stretch of 1's in column B I want to find the max in column A. The max column could be an indicator that A it is a max or the actual value in A, and be NA or 0 for other values of B.
The output I am hoping for looks something like this:
A B max
28 0 0
20 1 0
23 1 1
30 0 0
26 0 0
23 1 0
25 1 0
26 1 1
27 0 0
25 0 0
30 1 1
26 1 0
25 1 0
22 0 0
24 1 1
25 0 0
24 0 0
27 0 0
29 1 1
I've tried to generate groups per section of column B that = 1 but I did not get very far because most grouping functions require unique values between groups.
Also, please let me know if there are any improvements to the title for this problem.
Upvotes: 1
Views: 159
Reputation: 887088
One option would be data.table
library(data.table)
setDT(df)[, Max := +((A== max(A)) & B), rleid(B) ]
df
# A B Max
# 1: 28 0 0
# 2: 20 1 0
# 3: 23 1 1
# 4: 30 0 0
# 5: 26 0 0
# 6: 23 1 0
# 7: 25 1 0
# 8: 26 1 1
# 9: 27 0 0
#10: 25 0 0
#11: 30 1 1
#12: 26 1 0
#13: 25 1 0
#14: 22 0 0
#15: 24 1 1
#16: 25 0 0
#17: 24 0 0
#18: 27 0 0
#19: 29 1 1
Or as @Frank mentioned, for better efficiency, we can make use gmax
by first assigning column and then replace
DT[, MA := max(A), by=rleid(B)][A == MA & B, Max := 1L][]
Upvotes: 3
Reputation: 28339
Solution using dplyr
library(dplyr)
df %>%
group_by(with(rle(B), rep(seq_along(lengths), lengths))) %>%
mutate(MAX = ifelse(B == 0, 0, as.numeric(A == max(A)))) %>%
.[, c(1, 2, 4)]
A B MAX
<dbl> <dbl> <dbl>
1 28 0 0
2 20 1 0
3 23 1 1
4 30 0 0
5 26 0 0
6 23 1 0
7 25 1 0
8 26 1 1
9 27 0 0
10 25 0 0
11 30 1 1
12 26 1 0
13 25 1 0
14 22 0 0
15 24 1 1
16 25 0 0
17 24 0 0
18 27 0 0
19 29 1 1
Upvotes: 1