Reputation: 327
I have a data frame (a tibble, actually) df
, with two columns, a
and b
, and I want to filter out the rows in which a
is a substring of b
. I've tried
df %>%
dplyr::filter(grepl(a,b))
but I get a warning that seems to indicate that R is actually applying grepl
with the first argument being the whole column a
.
Is there any way to apply a regular expression involving two different columns to each row in a tibble (or data frame)?
Upvotes: 1
Views: 1906
Reputation: 5704
You can use stringr::str_detect
, which is vectorised over both string and pattern. (Whereas, as you noted, grepl
is only vectorised over its string argument.)
Using @Chi Pak's example:
library(dplyr)
library(stringr)
df %>%
filter(str_detect(B, fixed(A)))
# A B
# 1 b db
# 2 e ge
Upvotes: 1
Reputation: 1433
Or using base R apply and @Chi-Pak's reproducible example
df <- data.frame(A=letters[1:5],
B=paste0(letters[3:7],letters[c(2,2,4,3,5)]),
stringsAsFactors=F)
matched <- sapply(1:nrow(df), function(i) grepl(df$A[i], df$B[i]))
df[matched, ]
Result
A B
2 b db
5 e ge
Upvotes: 1
Reputation: 13581
If you're only interested in by-row comparisons, you can use rowwise()
:
df <- data.frame(A=letters[1:5],
B=paste0(letters[3:7],letters[c(2,2,4,3,5)]),
stringsAsFactors=F)
df %>%
rowwise() %>%
filter(grepl(A,B))
A B
1 b db
2 e ge
---------------------------------------------------------------------------------
If you want to know whether row-entry of A
is in all of B
:
df %>% rowwise() %>% filter(any(grepl(A,df$B)))
A B
1 b db
2 c ed
3 d fc
4 e ge
Upvotes: 4