Return 3 next and previous rows if value occurs in particular row

Question

I have data frame like this:

    Input = (" v1 v2
          1 A1 0
          2 B1 0
          3 C1 0
          4 D1 1
          5 E1 0
          6 F1 0
          7 G1 0
          8 H1 0
          9 I1 0
          10 J1 0
          11 K1 0
          12 A2 1
          13 B2 0
          14 C2 0
          15 D2 0
          16 E2 0
          17 F2 0
          18 G2 0
          19 H2 0
          20 I2 0
          21 J2 0
          22 K2 0
           ")
df = as.data.frame(read.table(textConnection(Input), header = T, row.names=1))

And I'd like to keep only rows with 1 in v2 and 3 previous and next rows around each 1, so desired output is:

v1 v2
A1 0
B1 0
C1 0
D1 1
E1 0
F1 0
G1 0
I1 0
J1 0
K1 0
A2 1
B2 0
C2 0
D2 0

So we have all 1-rows (in this case 2) and 6 corresponding neighbor rows (3 lower, 3 upper). In orginal dataset I have 100k+ rows and only several 1-rows spreaded in whole dataset.

I tried to do this with simple ifelse() in apply for prevs and next rows separately and then combine everything together but it doesn't work.

prev <- as.data.frame(apply(df, 1, function(x) ifelse(x[1]==1,x-1:3,0)))
next <- as.data.frame(apply(df, 1, function(x) ifelse(x[1]==1,x+1:3,0)))

I was thinking to use lag() and lead() but I don't know how to lag or lead n=3 rows only around with 1 in v2. Could you please help me out?

Ric S · Accepted Answer

One possible solution (maybe a bit lengthy but very interesting for other purposes as well) is to create multiple lags and leads of the variable of interest and then filter for any variable that has value equal to 1.

We first create two functions that produce n lags and n leads, respectively, starting from a dataframe:

lags <- function(data, variable, n){
  require(dplyr)
  require(purrr)
  
  variable <- enquo(variable)
  
  indices <- seq_len(n)
  quosures <- map(indices, ~quo(lag(!!variable, !!.x))) %>% 
    set_names(sprintf("lag_%02d", indices))
  
  mutate(data, !!!quosures)
}



leads <- function(data, variable, n){
  require(dplyr)
  require(purrr)
  
  variable <- enquo(variable)
  
  indices <- seq_len(n)
  quosures <- map(indices, ~quo(lead(!!variable, !!.x))) %>% 
    set_names(sprintf("lead_%02d", indices))
  
  mutate(data, !!!quosures)
}

Then we apply them to our dataframe and filter the observations that contains a 1:

library(dplyr)

df %>% 
  lags(v2, n = 3) %>% 
  leads(v2, n = 3) %>% 
  filter_all(any_vars(. == 1)) %>% 
  select(v1, v2)

#    v1 v2
# 1  A1  0
# 2  B1  0
# 3  C1  0
# 4  D1  1
# 5  E1  0
# 6  F1  0
# 7  G1  0
# 8  I1  0
# 9  J1  0
# 10 K1  0
# 11 A2  1
# 12 B2  0
# 13 C2  0
# 14 D2  0

Return 3 next and previous rows if value occurs in particular row

Answers (2)

Related Questions