Reputation: 13372
I have a data table
DT <- data.table(col1=c("a", "b", "c", "c", "a"), col2=c("b", "a", "c", "a", "b"), condition=c(TRUE, FALSE, FALSE, TRUE, FALSE))
col1 col2 condition
1: a b TRUE
2: b a FALSE
3: c c FALSE
4: c a TRUE
5: a b FALSE
and would like to remove rows on the following conditions:
condition==TRUE
(rows 1 and 4)condition==TRUE
(that is row 5, col1=a, col2=b)condition==TRUE
, but with col1 and col2 switched (that is row 2, col1=b and col2=a)So only row 3 should stay.
I'm doing this by making a new data table DTcond
with all rows meeting the condition, looping over the values for col1 and col2, and collecting the indices from DT
which will be removed.
DTcond <- DT[condition==TRUE,]
indices <- c()
for (i in 1:nrow(DTcond)) {
n1 <- DTcond[i, col1]
n2 <- DTcond[i, col2]
indices <- c(indices, DT[ ((col1 == n1 & col2 == n2) | (col1==n2 & col2 == n1)), which=T])
}
DT[!indices,]
col1 col2 condition
1: c c FALSE
This works but is terrible slow for large datasets and I guess there must be other ways in data.table to do this without loops or apply. Any suggestions how I could improve this (I'm new to data.table)?
Upvotes: 0
Views: 1144
Reputation: 66819
You can do an anti join:
mDT = DT[(condition), !"condition"][, rbind(.SD, rev(.SD), use.names = FALSE)]
DT[!mDT, on=names(mDT)]
# col1 col2 condition
# 1: c c FALSE
Upvotes: 4