Arun
Arun

Reputation: 669

Filter data.frame with another data.frame using select(contains)

I have 2 dataframes like the following:

df1

colA
A
B
C
D

df2

one  two
x   A
y   A;B
z   A;D;C
p   E
q   F

I want to filter df2 for entries contained in df1. i.e "two" containing values of colA, so that my output will be

   one  two
    x   A
    y   A;B
    z   A;D;C

I tried all these options that didn't work

df2filtered = df2 %>% filter(two %in% df1$colA)
df2filtered = df2 %>% filter(two %in% str_detect(df1$colA))
df2filtered = df2 %>% select(two, contains(df1$colA))

str_detect with character works but not when given in df like above. What is the right solution?

Upvotes: 1

Views: 281

Answers (3)

nniloc
nniloc

Reputation: 4243

Another option using str_detect. You can collapse df1$colA so that str_detect searches for A or B or C or D. e.g. "A|B|C|D".

library(tidyverse)

df2 %>% filter(str_detect(two, paste(df1$colA, collapse = '|')))

#>   one   two           
#> 1 x     A     
#> 2 y     A;B   
#> 3 z     A;D;C 

Upvotes: 0

tjebo
tjebo

Reputation: 23757

Your data is not "tidy". I'd reshape it into a long format. Then, filtering becomes easy.

Below an approach which makes use of an non-exported function of the eye package in order to split the column into an unknown number of columns. (disclaimer: I am the author of this package. The function was inspired and modified from this answer). Then pivot the result longer and filter by the presence in df1$colA. I'd leave the result in a tidy format, but you can of course melt it back to your rather messy shape.

library(tidyverse)

df1 <- read.table(text = "colA
A
B
C
D", header = TRUE)

df2 <- read.table(text = "one  two
x   A
y   A;B
z   A;D;C
p   E
q   F  ", header = TRUE)

#install.packages("eye")
eye:::split_mult(df2, "two", pattern = ";" ) %>%
  pivot_longer(cols = starts_with("var"), names_to = "var", values_to = "val") %>%
  drop_na(val)%>%
  select(-var) %>%
  group_by(one) %>%
  filter(any(val %in% df1$colA))
#> # A tibble: 6 x 2
#> # Groups:   one [3]
#>   one   val  
#>   <chr> <chr>
#> 1 x     A    
#> 2 y     A    
#> 3 y     B    
#> 4 z     A    
#> 5 z     D    
#> 6 z     C

Created on 2021-07-14 by the reprex package (v2.0.0)


because this function might change in the future, here for future reference:

split_mult <- function (x, col, pattern = "_", into = NULL, prefix = "var", 
    sep = "") 
{
    cols <- stringr::str_split_fixed(x[[col]], pattern, n = Inf)
    cols[which(cols == "")] <- NA_character_
    m <- dim(cols)[2]
    if (length(into) == m) {
        colnames(cols) <- into
    }
    else {
        colnames(cols) <- paste(prefix, 1:m, sep = sep)
    }
    cbind(cols, x[names(x) != col])
}

Upvotes: 0

Here's one way to obtaning the desired output using map to create an extra column to afterwards apply the filter.

library(tidyverse)
df2 %>%
  # Use map to check if any string in df1$colA is found in
  # df2$two; then use any to check if any entry is T
  mutate(stay =  map(two, function(x){
    any(str_detect(x,df1$colA))
  })) %>%
  # Filter
  filter(stay == T) %>%
  # Remove extra column
  select(-c(stay))

#  one   two
#1   x     A
#2   y   A;B
#3   z A;D;C

Upvotes: 1

Related Questions