Bruce
Bruce

Reputation: 113

filter out multiple rows based on the content of one element in a single row

I have a dataframe with many rows that are all repeated values except for the values in column dd.

If any one row contains the value "ACT" in this non-repeated column I need to remove all the rows which otherwise match this "ACT" row as well as the "ACT" row itself. So in the sample code I would want to retain only the six rows which contain "c" and "e" in column aa.

I have tried a variety of nested if-else with for loops and trying to somehow filter out based on value in aa when the "ACT" exist in dd but have not been able to figure out how to get away from the single row vector matching.

aa <- c("b","b","b","c","c","c","d","d","d","e","e","e")
bb <- c("t","t","t","w","w","w","r","r","r","s","s","s")
cc <- c(1,1,1,2,2,2,3,3,3,4,4,4)
dd <- c("CVR","ACT","CVR","CVR","CVR","CVR","ACT","CVR","CVR","CVR","CVR","CVR")

Ideally I'm look for a tidyverse solution but am open to anything of course.

Upvotes: 2

Views: 75

Answers (2)

M--
M--

Reputation: 28825

  • Using dplyr package:
library(dplyr)
df1 <- tibble(
  aa = c("b","b","b","c","c","c","d","d","d","e","e","e"),
  bb = c("t","t","t","w","w","w","r","r","r","s","s","s"),
  cc = c(1,1,1,2,2,2,3,3,3,4,4,4),
  dd = c("CVR","ACT","CVR","CVR","CVR","CVR","ACT","CVR","CVR","CVR","CVR","CVR")
)

anti_join(df1, df1[df1$dd=="ACT", ], by=c("aa","bb","cc"))
#> # A tibble: 6 x 4
#>   aa    bb       cc dd   
#>   <chr> <chr> <dbl> <chr>
#> 1 c     w         2 CVR  
#> 2 c     w         2 CVR  
#> 3 c     w         2 CVR  
#> 4 e     s         4 CVR  
#> 5 e     s         4 CVR  
#> 6 e     s         4 CVR
  • Using data.table package:
library(data.table)
df2 <- data.table(
  aa = c("b","b","b","c","c","c","d","d","d","e","e","e"),
  bb = c("t","t","t","w","w","w","r","r","r","s","s","s"),
  cc = c(1,1,1,2,2,2,3,3,3,4,4,4),
  dd = c("CVR","ACT","CVR","CVR","CVR","CVR","ACT","CVR","CVR","CVR","CVR","CVR")
)

df2[!df2[dd=="ACT",], on = c("aa","bb","bb")]
#>    aa bb cc  dd
#> 1:  c  w  2 CVR
#> 2:  c  w  2 CVR
#> 3:  c  w  2 CVR
#> 4:  e  s  4 CVR
#> 5:  e  s  4 CVR
#> 6:  e  s  4 CVR

Created on 2019-06-19 by the reprex package (v0.3.0)

Upvotes: 3

IceCreamToucan
IceCreamToucan

Reputation: 28675

You can put the vectors in a data.table and keep only the (aa, bb, cc) groups which don't have "ACT" in the dd column.

library(data.table)

df <- data.table(
  aa = c("b","b","b","c","c","c","d","d","d","e","e","e"),
  bb = c("t","t","t","w","w","w","r","r","r","s","s","s"),
  cc = c(1,1,1,2,2,2,3,3,3,4,4,4),
  dd = c("CVR","ACT","CVR","CVR","CVR","CVR","ACT","CVR","CVR","CVR","CVR","CVR")
)

df[, if(!"ACT" %in% dd) .SD, .(aa, bb, cc)]
#    aa bb cc  dd
# 1:  c  w  2 CVR
# 2:  c  w  2 CVR
# 3:  c  w  2 CVR
# 4:  e  s  4 CVR
# 5:  e  s  4 CVR
# 6:  e  s  4 CVR

Upvotes: 0

Related Questions