BSHuniversity
BSHuniversity

Reputation: 368

Dummy variable "switch-point" in R

I have a dummy variable that serves as a flag for a number of conditions in my data set. I can't figure out how to write a function that marks the spot in which the flag assumes a "final switch" -- a value that will not change for the rest of the data frame. In the example below, everything after the 7th observation is a "y".

  dplyr::tibble(
    observation = c(seq(1,10)),
    crop = c(runif(3,1,25),
              runif(1,50,100),
              runif(2,1,10),
              runif(4,50,100)),
    flag = c(rep("n", 3),
             rep("y", 1),
             rep("n", 2),
             rep("y", 4)))

Which yields:

   observation  crop flag 
         <int> <dbl> <chr>
 1           1 13.3  n    
 2           2  4.34 n    
 3           3 17.1  n    
 4           4 80.5  y    
 5           5  9.62 n    
 6           6  8.39 n    
 7           7 92.6  y    
 8           8 74.1  y    
 9           9 95.3  y    
10          10 69.9  y    

I've tried creating a second flag that marks every switch and returns the "final" switch/flag variable, but over my whole data frame that will likely be highly inefficient. Any suggestions are welcome and appreciated.

Upvotes: 0

Views: 204

Answers (3)

akrun
akrun

Reputation: 887971

We can make use of rleid from data.table

library(data.table)
setDT(df)[, flag2 := rleid(flag)]
df
#    observation      crop flag flag2
# 1:           1 21.472985    n     1
# 2:           2 21.563190    n     1
# 3:           3  1.393184    n     1
# 4:           4 88.422562    y     2
# 5:           5  6.383627    n     3
# 6:           6  8.484030    n     3
# 7:           7 86.998953    y     4
# 8:           8 62.220592    y     4
# 9:           9 93.141503    y     4
#10:          10 96.006885    y     4

Upvotes: 0

dwhdai
dwhdai

Reputation: 304

One way to do this may be to create a flag that cumulatively sums occurrences of flag switches.

cumsum_na <- function(x){
  x[which(is.na(x))] <- 0
  return(cumsum(x))
}

df <- dplyr::tibble(
    observation = c(seq(1,10)),
    crop = c(runif(3,1,25),
              runif(1,50,100),
              runif(2,1,10),
              runif(4,50,100)),
    flag = c(rep("n", 3),
             rep("y", 1),
             rep("n", 2),
             rep("y", 4)))

df %>%
  mutate(flag2 = ifelse(flag != lag(flag), 1, 0) %>%
               cumsum_na)

# A tibble: 10 x 4
   observation  crop flag  flag2
         <int> <dbl> <chr> <dbl>
 1           1 12.1  n         0
 2           2 11.2  n         0
 3           3  4.66 n         0
 4           4 61.6  y         1
 5           5  6.00 n         2
 6           6  9.54 n         2
 7           7 67.6  y         3
 8           8 86.7  y         3
 9           9 91.6  y         3
10          10 84.5  y         3

You can then do whatever you need to using the flag2 column (eg. filter for max value, take first row, which will give you the first occurrence of constant state).

Upvotes: 2

Gonzalo Falloux Costa
Gonzalo Falloux Costa

Reputation: 372

i count all the "n" first, and when when the final "n" is met, i get the index of the next obs

i=0
j=1
while (i<table(df$flag)["n"]) {
  if (as.character(df[j,3]) =="n" ) {
    i=i+1
    j=j+1
  } else j=j+1
}

You are looking for j

Upvotes: 0

Related Questions