filter out multiple rows based on the content of one element in a single row

Question

I have a dataframe with many rows that are all repeated values except for the values in column dd.

If any one row contains the value "ACT" in this non-repeated column I need to remove all the rows which otherwise match this "ACT" row as well as the "ACT" row itself. So in the sample code I would want to retain only the six rows which contain "c" and "e" in column aa.

I have tried a variety of nested if-else with for loops and trying to somehow filter out based on value in aa when the "ACT" exist in dd but have not been able to figure out how to get away from the single row vector matching.

aa <- c("b","b","b","c","c","c","d","d","d","e","e","e")
bb <- c("t","t","t","w","w","w","r","r","r","s","s","s")
cc <- c(1,1,1,2,2,2,3,3,3,4,4,4)
dd <- c("CVR","ACT","CVR","CVR","CVR","CVR","ACT","CVR","CVR","CVR","CVR","CVR")

Ideally I'm look for a tidyverse solution but am open to anything of course.

M-- · Accepted Answer

Using dplyr package:

library(dplyr)
df1 <- tibble(
  aa = c("b","b","b","c","c","c","d","d","d","e","e","e"),
  bb = c("t","t","t","w","w","w","r","r","r","s","s","s"),
  cc = c(1,1,1,2,2,2,3,3,3,4,4,4),
  dd = c("CVR","ACT","CVR","CVR","CVR","CVR","ACT","CVR","CVR","CVR","CVR","CVR")
)

anti_join(df1, df1[df1$dd=="ACT", ], by=c("aa","bb","cc"))
#> # A tibble: 6 x 4
#>   aa    bb       cc dd   
#>      
#> 1 c     w         2 CVR  
#> 2 c     w         2 CVR  
#> 3 c     w         2 CVR  
#> 4 e     s         4 CVR  
#> 5 e     s         4 CVR  
#> 6 e     s         4 CVR

Using data.table package:

library(data.table)
df2 <- data.table(
  aa = c("b","b","b","c","c","c","d","d","d","e","e","e"),
  bb = c("t","t","t","w","w","w","r","r","r","s","s","s"),
  cc = c(1,1,1,2,2,2,3,3,3,4,4,4),
  dd = c("CVR","ACT","CVR","CVR","CVR","CVR","ACT","CVR","CVR","CVR","CVR","CVR")
)

df2[!df2[dd=="ACT",], on = c("aa","bb","bb")]
#>    aa bb cc  dd
#> 1:  c  w  2 CVR
#> 2:  c  w  2 CVR
#> 3:  c  w  2 CVR
#> 4:  e  s  4 CVR
#> 5:  e  s  4 CVR
#> 6:  e  s  4 CVR

^{Created on 2019-06-19 by the reprex package (v0.3.0)}

filter out multiple rows based on the content of one element in a single row

Answers (2)

Related Questions