jkfirewood
jkfirewood

Reputation: 11

Filter different columns dataframe in R based on another dataframe

I have some semi-complex filtering I need to do:

Identifier <- c(1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5)
item1 <- c("a", "b", "c", "a", "b", "c", "d", "a", "b", "d", "b", "a", "c")
item2 <- c("x", "y", "z", "z", "x", "y", "z", "y", "z", "x", "y", "x", "y")
item3 <- c("p", "q", "r", "p", "q", "r", "p", "q", "r", "p", "q", "r", "p")
df1 <- data.frame(Identifier, item1, item2, item3)
df1

header <- c("Identifier","item1","item2","item3")
values <- c("1","b","y","p")
needed<- c("yes","yes","yes","no")
df2 <- data.frame(header, values, needed)
df2

I then want to use df2, to apply multiple filters on df1. So based on df2, I want to:

The goal is then to have df2 as an excel csv file, and the user can include what columns they would like filtered, and for what value. So, these would remain dynamic without them needing to edit the R code.

Upvotes: 1

Views: 49

Answers (1)

akrun
akrun

Reputation: 887118

We may use Map in base R. When we filter (keep or remove those elements, there is a possibility to have different lengths for each columns as showed before

Map(function(x, nm) {
       i1 <- match(nm, df2$header)        
        if(df2$needed[i1] == "yes") x[x == df2$values[i1]] 
       else x[x != df2$values[i1]]} , df1, names(df1))

-output

$Identifier
[1] 1 1 1

$item1
[1] "b" "b" "b" "b"

$item2
[1] "y" "y" "y" "y" "y"

$item3
[1] "q" "r" "q" "r" "q" "r" "q" "r"

If we want to keep the data as data.frame, it may be better to replace those doesn't conform to the logic as NA

library(dplyr)
df1 %>%
   mutate(across(everything(), ~  {
    i1 <- match(cur_column(), df2$header)
   case_when((df2$needed[i1] == "yes" &.x == df2$values[i1])|
      (df2$needed[i1] == "no" & .x != df2$values[i1]) ~ .x )
  }))

-output

   Identifier item1 item2 item3
1           1  <NA>  <NA>  <NA>
2           1     b     y     q
3           1  <NA>  <NA>     r
4          NA  <NA>  <NA>  <NA>
5          NA     b  <NA>     q
6          NA  <NA>     y     r
7          NA  <NA>  <NA>  <NA>
8          NA  <NA>     y     q
9          NA     b  <NA>     r
10         NA  <NA>  <NA>  <NA>
11         NA     b     y     q
12         NA  <NA>  <NA>     r
13         NA  <NA>     y  <NA>

If we need a single value

df1 %>%
   mutate(across(everything(), ~  {
    i1 <- match(cur_column(), df2$header)
   case_when((df2$needed[i1] == "yes" &.x == df2$values[i1])|
      (df2$needed[i1] == "no" & .x != df2$values[i1]) ~ .x )
  })) %>%
   summarise(across(everything(), ~ .x[complete.cases(.x)][1]))

-output

   Identifier item1 item2 item3
1          1     b     y     q

Upvotes: 0

Related Questions