Reputation: 111
In a column of data I am trying to identify the minimum value in between a new high and the previous new high. In the example below I marked where the new highs are and what the minimum value is between them. What is the R formula to figure this out? In excel I would be able to do it using the match and max/min formula. I am not sure how to find the minimum value in a segment of a column in r.
data
0 New High
-80
-160
-160
-160
-160
-160
-347
-351
-351
-444
-444
-444
43 New High -444
43
10
10
-6
20
352 New High -6
352
352
528 New High 352
528
511
511
518
472
Upvotes: 0
Views: 1137
Reputation: 388982
You can use cummax
to calculate cumulative maximum until that row and create a new group whenever the current row's value is greater than previous cummax
value. Within each group you can return the minimum value.
library(dplyr)
df %>%
group_by(group = cumsum(V1 > lag(cummax(V1), default = first(V1)))) %>%
summarise(min_value = min(V1))
# group min_value
# <int> <int>
#1 0 -444
#2 1 -6
#3 2 352
#4 3 472
This considers the last part as another group hence also returns minimum value in that part. You can remove the last row if it is not needed.
To apply for multiple columns, we can write a function and call it with lapply
:
apply_fun <- function(data, col) {
col1 <- sym(col)
df %>%
group_by(group = cumsum(!!col1 > lag(cummax(!!col1),
default = first(!!col1)))) %>%
summarise(min_value = min(!!col1))
}
result <- lapply(names(df), apply_fun, data = df)
data
df <- structure(list(V1 = c(0L, -80L, -160L, -160L, -160L, -160L, -160L,
-347L, -351L, -351L, -444L, -444L, -444L, 43L, 43L, 10L, 10L,
-6L, 20L, 352L, 352L, 352L, 528L, 528L, 511L, 511L, 518L, 472L
)), class = "data.frame", row.names = c(NA, -28L))
Upvotes: 1