Reputation: 13103
I have a data.frame
and wish to delete rows that match certain somewhat complex criteria. I can do so using a repetitive series of lines as below. However, this approach is not general.
my.df <- read.table(text = '
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9
0 1 0 1 1 1 0 0 0
1 0 1 1 1 1 0 0 1
0 1 1 0 1 1 0 0 1
0 1 1 1 1 1 1 1 1
1 0 1 0 1 1 0 1 1
0 0 1 0 0 0 1 0 1
0 0 1 0 0 0 0 0 0
1 0 1 0 1 1 1 0 0
1 1 1 1 0 0 1 0 1
0 1 0 0 0 0 0 0 1
0 0 1 1 1 0 1 0 1
1 0 0 0 1 0 0 0 1
1 0 1 1 0 0 0 1 0
0 0 1 1 0 0 1 1 1
1 0 0 0 1 0 0 1 0
0 0 0 0 0 1 0 1 1
1 1 0 0 1 1 1 1 1
0 0 1 0 0 0 0 1 0
0 0 1 1 1 0 1 0 0
0 1 0 0 1 1 1 0 0
', header = TRUE)
desired.result <- read.table(text = '
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9
0 0 1 0 0 0 0 0 0
1 1 1 1 0 0 1 0 1
1 0 1 1 0 0 0 1 0
0 0 1 0 0 0 0 1 0
', header = TRUE)
# this works, but is not general
my.df2 <- my.df
my.df2 <- my.df2[!(my.df2[,1]==0 & (my.df2[,4]==1 | my.df2[,5]==1)),]
my.df2 <- my.df2[!(my.df2[,2]==0 & (my.df2[,6]==1 | my.df2[,7]==1)),]
my.df2 <- my.df2[!(my.df2[,3]==0 & (my.df2[,8]==1 | my.df2[,9]==1)),]
my.df2
row.names(my.df2) <- NULL
all.equal(my.df2, desired.result)
# [1] TRUE
I would like to generalize this code. I regularly combine sapply
and apply
to operate on data. However, I guess I have never combined those function to delete data and I cannot figure out how to do it.
The code below identifies which lines to delete, but does not delete them. Numerous variations of the code below have not worked.
my.df3 <- as.matrix(my.df)
sapply(seq_along(1:3), function(i) {
apply(my.df3, 1, function(j) {
!(j[i]==0 & (j[(i+1)*2]==1 | j[((i+1)*2+1)]==1))
})
})
I could find no solution searching the internet for 'delete rows with apply'. Thank you for any advice. I prefer a solution in base R
. I suspect a simple modification of the sapply
statement is all that is needed. Although, perhaps an entirely different approach is better.
Upvotes: 0
Views: 441
Reputation: 13103
Here is a functional solution based on combining variations tried before posting with Robert Krzyzanowski's suggestion of nesting an apply
within my.df3
:
my.df3 <- as.matrix(my.df)
my.test <- sapply(seq_along(1:3), function(i) {
apply(my.df3, 1, function(j) {
!(j[i]==0 & (j[(i+1)*2]==1 | j[((i+1)*2+1)]==1))
})
})
my.df3[apply(my.test,1,function(i) {all(i)==TRUE}),]
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9
[1,] 0 0 1 0 0 0 0 0 0
[2,] 1 1 1 1 0 0 1 0 1
[3,] 1 0 1 1 0 0 0 1 0
[4,] 0 0 1 0 0 0 0 1 0
Upvotes: 0
Reputation: 9344
First off, seq_along(1:3)
is redundant as that function will simply return 1:3
. Second, if the result of your apply(..., 1, ...)
call is a logical vector, you can simply subset using it:
my.df3[apply(my.df3, 1, ...,), ]
Upvotes: 1