Reputation: 113
I have a dataframe with many rows that are all repeated values except for the values in column dd
.
If any one row contains the value "ACT" in this non-repeated column I need to remove all the rows which otherwise match this "ACT" row as well as the "ACT" row itself. So in the sample code I would want to retain only the six rows which contain "c" and "e" in column aa
.
I have tried a variety of nested if-else with for loops and trying to somehow filter out based on value in aa
when the "ACT" exist in dd
but have not been able to figure out how to get away from the single row vector matching.
aa <- c("b","b","b","c","c","c","d","d","d","e","e","e")
bb <- c("t","t","t","w","w","w","r","r","r","s","s","s")
cc <- c(1,1,1,2,2,2,3,3,3,4,4,4)
dd <- c("CVR","ACT","CVR","CVR","CVR","CVR","ACT","CVR","CVR","CVR","CVR","CVR")
Ideally I'm look for a tidyverse
solution but am open to anything of course.
Upvotes: 2
Views: 75
Reputation: 28825
dplyr
package:library(dplyr)
df1 <- tibble(
aa = c("b","b","b","c","c","c","d","d","d","e","e","e"),
bb = c("t","t","t","w","w","w","r","r","r","s","s","s"),
cc = c(1,1,1,2,2,2,3,3,3,4,4,4),
dd = c("CVR","ACT","CVR","CVR","CVR","CVR","ACT","CVR","CVR","CVR","CVR","CVR")
)
anti_join(df1, df1[df1$dd=="ACT", ], by=c("aa","bb","cc"))
#> # A tibble: 6 x 4
#> aa bb cc dd
#> <chr> <chr> <dbl> <chr>
#> 1 c w 2 CVR
#> 2 c w 2 CVR
#> 3 c w 2 CVR
#> 4 e s 4 CVR
#> 5 e s 4 CVR
#> 6 e s 4 CVR
data.table
package:library(data.table)
df2 <- data.table(
aa = c("b","b","b","c","c","c","d","d","d","e","e","e"),
bb = c("t","t","t","w","w","w","r","r","r","s","s","s"),
cc = c(1,1,1,2,2,2,3,3,3,4,4,4),
dd = c("CVR","ACT","CVR","CVR","CVR","CVR","ACT","CVR","CVR","CVR","CVR","CVR")
)
df2[!df2[dd=="ACT",], on = c("aa","bb","bb")]
#> aa bb cc dd
#> 1: c w 2 CVR
#> 2: c w 2 CVR
#> 3: c w 2 CVR
#> 4: e s 4 CVR
#> 5: e s 4 CVR
#> 6: e s 4 CVR
Created on 2019-06-19 by the reprex package (v0.3.0)
Upvotes: 3
Reputation: 28675
You can put the vectors in a data.table and keep only the (aa, bb, cc) groups which don't have "ACT" in the dd column.
library(data.table)
df <- data.table(
aa = c("b","b","b","c","c","c","d","d","d","e","e","e"),
bb = c("t","t","t","w","w","w","r","r","r","s","s","s"),
cc = c(1,1,1,2,2,2,3,3,3,4,4,4),
dd = c("CVR","ACT","CVR","CVR","CVR","CVR","ACT","CVR","CVR","CVR","CVR","CVR")
)
df[, if(!"ACT" %in% dd) .SD, .(aa, bb, cc)]
# aa bb cc dd
# 1: c w 2 CVR
# 2: c w 2 CVR
# 3: c w 2 CVR
# 4: e s 4 CVR
# 5: e s 4 CVR
# 6: e s 4 CVR
Upvotes: 0