Reputation: 669
I have 2 dataframes like the following:
df1
colA
A
B
C
D
df2
one two
x A
y A;B
z A;D;C
p E
q F
I want to filter df2 for entries contained in df1. i.e "two" containing values of colA, so that my output will be
one two
x A
y A;B
z A;D;C
I tried all these options that didn't work
df2filtered = df2 %>% filter(two %in% df1$colA)
df2filtered = df2 %>% filter(two %in% str_detect(df1$colA))
df2filtered = df2 %>% select(two, contains(df1$colA))
str_detect
with character works but not when given in df like above. What is the right solution?
Upvotes: 1
Views: 281
Reputation: 4243
Another option using str_detect
. You can collapse df1$colA
so that str_detect
searches for A
or B
or C
or D
. e.g. "A|B|C|D"
.
library(tidyverse)
df2 %>% filter(str_detect(two, paste(df1$colA, collapse = '|')))
#> one two
#> 1 x A
#> 2 y A;B
#> 3 z A;D;C
Upvotes: 0
Reputation: 23757
Your data is not "tidy". I'd reshape it into a long format. Then, filtering becomes easy.
Below an approach which makes use of an non-exported function of the eye package in order to split the column into an unknown number of columns. (disclaimer: I am the author of this package. The function was inspired and modified from this answer). Then pivot the result longer and filter by the presence in df1$colA
. I'd leave the result in a tidy format, but you can of course melt it back to your rather messy shape.
library(tidyverse)
df1 <- read.table(text = "colA
A
B
C
D", header = TRUE)
df2 <- read.table(text = "one two
x A
y A;B
z A;D;C
p E
q F ", header = TRUE)
#install.packages("eye")
eye:::split_mult(df2, "two", pattern = ";" ) %>%
pivot_longer(cols = starts_with("var"), names_to = "var", values_to = "val") %>%
drop_na(val)%>%
select(-var) %>%
group_by(one) %>%
filter(any(val %in% df1$colA))
#> # A tibble: 6 x 2
#> # Groups: one [3]
#> one val
#> <chr> <chr>
#> 1 x A
#> 2 y A
#> 3 y B
#> 4 z A
#> 5 z D
#> 6 z C
Created on 2021-07-14 by the reprex package (v2.0.0)
because this function might change in the future, here for future reference:
split_mult <- function (x, col, pattern = "_", into = NULL, prefix = "var",
sep = "")
{
cols <- stringr::str_split_fixed(x[[col]], pattern, n = Inf)
cols[which(cols == "")] <- NA_character_
m <- dim(cols)[2]
if (length(into) == m) {
colnames(cols) <- into
}
else {
colnames(cols) <- paste(prefix, 1:m, sep = sep)
}
cbind(cols, x[names(x) != col])
}
Upvotes: 0
Reputation: 5620
Here's one way to obtaning the desired output using map
to create an extra column to afterwards apply the filter.
library(tidyverse)
df2 %>%
# Use map to check if any string in df1$colA is found in
# df2$two; then use any to check if any entry is T
mutate(stay = map(two, function(x){
any(str_detect(x,df1$colA))
})) %>%
# Filter
filter(stay == T) %>%
# Remove extra column
select(-c(stay))
# one two
#1 x A
#2 y A;B
#3 z A;D;C
Upvotes: 1