Adamm
Adamm

Reputation: 2306

Return 3 next and previous rows if value occurs in particular row

I have data frame like this:

    Input = (" v1 v2
          1 A1 0
          2 B1 0
          3 C1 0
          4 D1 1
          5 E1 0
          6 F1 0
          7 G1 0
          8 H1 0
          9 I1 0
          10 J1 0
          11 K1 0
          12 A2 1
          13 B2 0
          14 C2 0
          15 D2 0
          16 E2 0
          17 F2 0
          18 G2 0
          19 H2 0
          20 I2 0
          21 J2 0
          22 K2 0
           ")
df = as.data.frame(read.table(textConnection(Input), header = T, row.names=1))

And I'd like to keep only rows with 1 in v2 and 3 previous and next rows around each 1, so desired output is:

v1 v2
A1 0
B1 0
C1 0
D1 1
E1 0
F1 0
G1 0
I1 0
J1 0
K1 0
A2 1
B2 0
C2 0
D2 0

So we have all 1-rows (in this case 2) and 6 corresponding neighbor rows (3 lower, 3 upper). In orginal dataset I have 100k+ rows and only several 1-rows spreaded in whole dataset.

I tried to do this with simple ifelse() in apply for prevs and next rows separately and then combine everything together but it doesn't work.

prev <- as.data.frame(apply(df, 1, function(x) ifelse(x[1]==1,x-1:3,0)))
next <- as.data.frame(apply(df, 1, function(x) ifelse(x[1]==1,x+1:3,0)))

I was thinking to use lag() and lead() but I don't know how to lag or lead n=3 rows only around with 1 in v2. Could you please help me out?

Upvotes: 1

Views: 61

Answers (2)

Ric S
Ric S

Reputation: 9247

One possible solution (maybe a bit lengthy but very interesting for other purposes as well) is to create multiple lags and leads of the variable of interest and then filter for any variable that has value equal to 1.

We first create two functions that produce n lags and n leads, respectively, starting from a dataframe:

lags <- function(data, variable, n){
  require(dplyr)
  require(purrr)
  
  variable <- enquo(variable)
  
  indices <- seq_len(n)
  quosures <- map(indices, ~quo(lag(!!variable, !!.x))) %>% 
    set_names(sprintf("lag_%02d", indices))
  
  mutate(data, !!!quosures)
}



leads <- function(data, variable, n){
  require(dplyr)
  require(purrr)
  
  variable <- enquo(variable)
  
  indices <- seq_len(n)
  quosures <- map(indices, ~quo(lead(!!variable, !!.x))) %>% 
    set_names(sprintf("lead_%02d", indices))
  
  mutate(data, !!!quosures)
}

Then we apply them to our dataframe and filter the observations that contains a 1:

library(dplyr)

df %>% 
  lags(v2, n = 3) %>% 
  leads(v2, n = 3) %>% 
  filter_all(any_vars(. == 1)) %>% 
  select(v1, v2)

#    v1 v2
# 1  A1  0
# 2  B1  0
# 3  C1  0
# 4  D1  1
# 5  E1  0
# 6  F1  0
# 7  G1  0
# 8  I1  0
# 9  J1  0
# 10 K1  0
# 11 A2  1
# 12 B2  0
# 13 C2  0
# 14 D2  0

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 388817

We can find out indices where v2 = 1 occurs and use sapply to generate row numbers -3 to +3 of each index.

#get row index where v2 = 1
inds <- which(df$v2 == 1)
#unique to remove overlapping row index
inds2 <- unique(c(sapply(inds, `+`, -3:3)))
#remove negative values or values which are greater than number of rows in df
inds2 <- inds2[inds2 > 0 & inds2 <= nrow(df)]
#select rows.
df[inds2, ]

#   v1 v2
#1  A1  0
#2  B1  0
#3  C1  0
#4  D1  1
#5  E1  0
#6  F1  0
#7  G1  0
#9  I1  0
#10 J1  0
#11 K1  0
#12 A2  1
#13 B2  0
#14 C2  0
#15 D2  0

Upvotes: 2

Related Questions