Mike
Mike

Reputation: 1069

Remove data.frame rows containing zero, where adjacent rows contain zero

I would like to remove all rows that contain a zero, but only if (the row below it is zero AND the row above it is also zero) OR (it is a zero AND it is the first observation AND the number below it is also a zero).

For example:

RowNumb   Column2  
  1          0  
  2          0  
  3          0  
  4          1   
  5          0  
  6          1    
  7          1  
  8          0  
  9          0   
  10         0

I would like to remove rows 1, 2, 9 and 10, as these are the only rows that are equal to zero, with a zero below them and either a zero or nothing (in the case of rowNumb 1) above them, such that I get the following:

RowNumb   Column2  
  3          0  
  4          1   
  5          0  
  6          1    
  7          1  
  8          0  

Does anyone know of a way of doing this without using a loop?

Upvotes: 2

Views: 153

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269441

1) rollapply This uses rollapply from the zoo package to check if any of three in a row (and because of partial=TRUE two in a row at each end) are not zero:

library(zoo)

DF[ rollapply(DF$Column2 != 0, 3, any, partial = TRUE), ]

giving:

  RowNumb Column2
3       3       0
4       4       1
5       5       0
6       6       1
7       7       1
8       8       0

1a) This variation also works:

DF[ rollapply(c(0, DF$Column2, 0) != 0, 3, any), ]

2) embed This solution does not use any packages. For this example embed forms a 10 x 3 matrix whose rows contain successive triples (two in a row and a zero for rows at the ends) and computes a logical matrix from that and applies any to each row:

DF[apply(embed(c(0, DF$Column2, 0) != 0, 3), 1, any), ]

Upvotes: 2

Roland
Roland

Reputation: 132606

You can use filter to sum each absolute value with the preceding and following absolute value and compare that sum with 0:

DF <- read.table(text="RowNumb   Column2  
  1          0  
  2          0  
  3          0  
  4          1   
  5          0  
  6          1    
  7          1  
  8          0  
  9          0   
  10         0", header=TRUE)

rem <- na.omit(filter(abs(c(0, DF$Column2, 0)), rep(1, 3)) != 0L)

DF[rem,]
#  RowNumb Column2
#3       3       0
#4       4       1
#5       5       0
#6       6       1
#7       7       1
#8       8       0

This assumes there are no NA values. If those can occur you'd need to modify this a bit:

x <- c(0, DF$Column2, 0)         
rem <- na.omit(filter(x != 0L | is.na(x) , rep(1, 3)) != 0L)

Upvotes: 4

talat
talat

Reputation: 70256

Here's a dplyr approach using lag and lead functions:

require(dplyr)
df %>% filter(!(Column2 == 0 & lag(Column2, default = 0) == 0 & lead(Column2,default = 0) == 0))
#  RowNumb Column2
#1       3       0
#2       4       1
#3       5       0
#4       6       1
#5       7       1
#6       8       0

Upvotes: 3

Related Questions