Reputation: 11
I have some semi-complex filtering I need to do:
Identifier <- c(1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5)
item1 <- c("a", "b", "c", "a", "b", "c", "d", "a", "b", "d", "b", "a", "c")
item2 <- c("x", "y", "z", "z", "x", "y", "z", "y", "z", "x", "y", "x", "y")
item3 <- c("p", "q", "r", "p", "q", "r", "p", "q", "r", "p", "q", "r", "p")
df1 <- data.frame(Identifier, item1, item2, item3)
df1
header <- c("Identifier","item1","item2","item3")
values <- c("1","b","y","p")
needed<- c("yes","yes","yes","no")
df2 <- data.frame(header, values, needed)
df2
I then want to use df2, to apply multiple filters on df1. So based on df2, I want to:
The goal is then to have df2 as an excel csv file, and the user can include what columns they would like filtered, and for what value. So, these would remain dynamic without them needing to edit the R code.
Upvotes: 1
Views: 49
Reputation: 887118
We may use Map
in base R
. When we filter
(keep or remove those elements, there is a possibility to have different lengths for each columns as showed before
Map(function(x, nm) {
i1 <- match(nm, df2$header)
if(df2$needed[i1] == "yes") x[x == df2$values[i1]]
else x[x != df2$values[i1]]} , df1, names(df1))
-output
$Identifier
[1] 1 1 1
$item1
[1] "b" "b" "b" "b"
$item2
[1] "y" "y" "y" "y" "y"
$item3
[1] "q" "r" "q" "r" "q" "r" "q" "r"
If we want to keep the data as data.frame
, it may be better to replace
those doesn't conform to the logic as NA
library(dplyr)
df1 %>%
mutate(across(everything(), ~ {
i1 <- match(cur_column(), df2$header)
case_when((df2$needed[i1] == "yes" &.x == df2$values[i1])|
(df2$needed[i1] == "no" & .x != df2$values[i1]) ~ .x )
}))
-output
Identifier item1 item2 item3
1 1 <NA> <NA> <NA>
2 1 b y q
3 1 <NA> <NA> r
4 NA <NA> <NA> <NA>
5 NA b <NA> q
6 NA <NA> y r
7 NA <NA> <NA> <NA>
8 NA <NA> y q
9 NA b <NA> r
10 NA <NA> <NA> <NA>
11 NA b y q
12 NA <NA> <NA> r
13 NA <NA> y <NA>
If we need a single value
df1 %>%
mutate(across(everything(), ~ {
i1 <- match(cur_column(), df2$header)
case_when((df2$needed[i1] == "yes" &.x == df2$values[i1])|
(df2$needed[i1] == "no" & .x != df2$values[i1]) ~ .x )
})) %>%
summarise(across(everything(), ~ .x[complete.cases(.x)][1]))
-output
Identifier item1 item2 item3
1 1 b y q
Upvotes: 0