Mark Miller
Mark Miller

Reputation: 13103

delete rows with apply

I have a data.frame and wish to delete rows that match certain somewhat complex criteria. I can do so using a repetitive series of lines as below. However, this approach is not general.

my.df <- read.table(text = '
  Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9
    0    1    0    1    1    1    0    0    0
    1    0    1    1    1    1    0    0    1
    0    1    1    0    1    1    0    0    1
    0    1    1    1    1    1    1    1    1
    1    0    1    0    1    1    0    1    1
    0    0    1    0    0    0    1    0    1
    0    0    1    0    0    0    0    0    0
    1    0    1    0    1    1    1    0    0
    1    1    1    1    0    0    1    0    1
    0    1    0    0    0    0    0    0    1
    0    0    1    1    1    0    1    0    1
    1    0    0    0    1    0    0    0    1
    1    0    1    1    0    0    0    1    0
    0    0    1    1    0    0    1    1    1
    1    0    0    0    1    0    0    1    0
    0    0    0    0    0    1    0    1    1
    1    1    0    0    1    1    1    1    1
    0    0    1    0    0    0    0    1    0
    0    0    1    1    1    0    1    0    0
    0    1    0    0    1    1    1    0    0
', header = TRUE)

desired.result <- read.table(text = '
  Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9
    0    0    1    0    0    0    0    0    0
    1    1    1    1    0    0    1    0    1
    1    0    1    1    0    0    0    1    0
    0    0    1    0    0    0    0    1    0
', header = TRUE)

# this works, but is not general

my.df2 <- my.df
my.df2 <- my.df2[!(my.df2[,1]==0 & (my.df2[,4]==1 | my.df2[,5]==1)),]
my.df2 <- my.df2[!(my.df2[,2]==0 & (my.df2[,6]==1 | my.df2[,7]==1)),]
my.df2 <- my.df2[!(my.df2[,3]==0 & (my.df2[,8]==1 | my.df2[,9]==1)),]
my.df2

row.names(my.df2) <- NULL
all.equal(my.df2, desired.result)
# [1] TRUE

I would like to generalize this code. I regularly combine sapply and apply to operate on data. However, I guess I have never combined those function to delete data and I cannot figure out how to do it.

The code below identifies which lines to delete, but does not delete them. Numerous variations of the code below have not worked.

my.df3 <- as.matrix(my.df)

sapply(seq_along(1:3), function(i) {
       apply(my.df3, 1, function(j) { 
            !(j[i]==0 & (j[(i+1)*2]==1 | j[((i+1)*2+1)]==1)) 
       }) 
})

I could find no solution searching the internet for 'delete rows with apply'. Thank you for any advice. I prefer a solution in base R. I suspect a simple modification of the sapply statement is all that is needed. Although, perhaps an entirely different approach is better.

Upvotes: 0

Views: 441

Answers (2)

Mark Miller
Mark Miller

Reputation: 13103

Here is a functional solution based on combining variations tried before posting with Robert Krzyzanowski's suggestion of nesting an apply within my.df3:

my.df3 <- as.matrix(my.df)

my.test <- sapply(seq_along(1:3), function(i) {
                  apply(my.df3, 1, function(j) { 
                       !(j[i]==0 & (j[(i+1)*2]==1 | j[((i+1)*2+1)]==1)) 
                  })
}) 

my.df3[apply(my.test,1,function(i) {all(i)==TRUE}),]

     Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9
[1,]    0    0    1    0    0    0    0    0    0
[2,]    1    1    1    1    0    0    1    0    1
[3,]    1    0    1    1    0    0    0    1    0
[4,]    0    0    1    0    0    0    0    1    0

Upvotes: 0

Robert Krzyzanowski
Robert Krzyzanowski

Reputation: 9344

First off, seq_along(1:3) is redundant as that function will simply return 1:3. Second, if the result of your apply(..., 1, ...) call is a logical vector, you can simply subset using it:

my.df3[apply(my.df3, 1, ...,), ]

Upvotes: 1

Related Questions