user1981275
user1981275

Reputation: 13372

Remove rows from data.table that meet condition

I have a data table

DT <- data.table(col1=c("a", "b", "c", "c", "a"), col2=c("b", "a", "c", "a", "b"), condition=c(TRUE, FALSE, FALSE, TRUE, FALSE))

   col1 col2 condition
1:    a    b      TRUE
2:    b    a     FALSE
3:    c    c     FALSE
4:    c    a      TRUE
5:    a    b     FALSE

and would like to remove rows on the following conditions:

So only row 3 should stay.

I'm doing this by making a new data table DTcond with all rows meeting the condition, looping over the values for col1 and col2, and collecting the indices from DT which will be removed.

DTcond <- DT[condition==TRUE,]
indices <- c()
for (i in 1:nrow(DTcond)) {
    n1 <- DTcond[i, col1]
    n2 <- DTcond[i, col2]
    indices <- c(indices, DT[ ((col1 == n1 & col2 == n2) | (col1==n2 & col2 == n1)), which=T])
}

DT[!indices,]
   col1 col2 condition
1:    c    c     FALSE

This works but is terrible slow for large datasets and I guess there must be other ways in data.table to do this without loops or apply. Any suggestions how I could improve this (I'm new to data.table)?

Upvotes: 0

Views: 1144

Answers (1)

Frank
Frank

Reputation: 66819

You can do an anti join:

mDT = DT[(condition), !"condition"][, rbind(.SD, rev(.SD), use.names = FALSE)]
DT[!mDT, on=names(mDT)]

#    col1 col2 condition
# 1:    c    c     FALSE

Upvotes: 4

Related Questions