Reputation: 368
I have a dummy variable that serves as a flag for a number of conditions in my data set. I can't figure out how to write a function that marks the spot in which the flag assumes a "final switch" -- a value that will not change for the rest of the data frame. In the example below, everything after the 7th observation is a "y".
dplyr::tibble(
observation = c(seq(1,10)),
crop = c(runif(3,1,25),
runif(1,50,100),
runif(2,1,10),
runif(4,50,100)),
flag = c(rep("n", 3),
rep("y", 1),
rep("n", 2),
rep("y", 4)))
Which yields:
observation crop flag
<int> <dbl> <chr>
1 1 13.3 n
2 2 4.34 n
3 3 17.1 n
4 4 80.5 y
5 5 9.62 n
6 6 8.39 n
7 7 92.6 y
8 8 74.1 y
9 9 95.3 y
10 10 69.9 y
I've tried creating a second flag that marks every switch and returns the "final" switch/flag variable, but over my whole data frame that will likely be highly inefficient. Any suggestions are welcome and appreciated.
Upvotes: 0
Views: 204
Reputation: 887971
We can make use of rleid
from data.table
library(data.table)
setDT(df)[, flag2 := rleid(flag)]
df
# observation crop flag flag2
# 1: 1 21.472985 n 1
# 2: 2 21.563190 n 1
# 3: 3 1.393184 n 1
# 4: 4 88.422562 y 2
# 5: 5 6.383627 n 3
# 6: 6 8.484030 n 3
# 7: 7 86.998953 y 4
# 8: 8 62.220592 y 4
# 9: 9 93.141503 y 4
#10: 10 96.006885 y 4
Upvotes: 0
Reputation: 304
One way to do this may be to create a flag that cumulatively sums occurrences of flag switches.
cumsum_na <- function(x){
x[which(is.na(x))] <- 0
return(cumsum(x))
}
df <- dplyr::tibble(
observation = c(seq(1,10)),
crop = c(runif(3,1,25),
runif(1,50,100),
runif(2,1,10),
runif(4,50,100)),
flag = c(rep("n", 3),
rep("y", 1),
rep("n", 2),
rep("y", 4)))
df %>%
mutate(flag2 = ifelse(flag != lag(flag), 1, 0) %>%
cumsum_na)
# A tibble: 10 x 4
observation crop flag flag2
<int> <dbl> <chr> <dbl>
1 1 12.1 n 0
2 2 11.2 n 0
3 3 4.66 n 0
4 4 61.6 y 1
5 5 6.00 n 2
6 6 9.54 n 2
7 7 67.6 y 3
8 8 86.7 y 3
9 9 91.6 y 3
10 10 84.5 y 3
You can then do whatever you need to using the flag2
column (eg. filter for max value, take first row, which will give you the first occurrence of constant state).
Upvotes: 2
Reputation: 372
i count all the "n" first, and when when the final "n" is met, i get the index of the next obs
i=0
j=1
while (i<table(df$flag)["n"]) {
if (as.character(df[j,3]) =="n" ) {
i=i+1
j=j+1
} else j=j+1
}
You are looking for j
Upvotes: 0