Daniel Miller
Daniel Miller

Reputation: 327

In R, how do you compare two columns with a regex, row-by row?

I have a data frame (a tibble, actually) df, with two columns, a and b, and I want to filter out the rows in which a is a substring of b. I've tried

df %>%
  dplyr::filter(grepl(a,b))

but I get a warning that seems to indicate that R is actually applying grepl with the first argument being the whole column a.

Is there any way to apply a regular expression involving two different columns to each row in a tibble (or data frame)?

Upvotes: 1

Views: 1906

Answers (3)

Scarabee
Scarabee

Reputation: 5704

You can use stringr::str_detect, which is vectorised over both string and pattern. (Whereas, as you noted, grepl is only vectorised over its string argument.)

Using @Chi Pak's example:

library(dplyr)
library(stringr)

df %>% 
  filter(str_detect(B, fixed(A)))
#   A  B
# 1 b db
# 2 e ge

Upvotes: 1

Damian
Damian

Reputation: 1433

Or using base R apply and @Chi-Pak's reproducible example

df <- data.frame(A=letters[1:5],
                 B=paste0(letters[3:7],letters[c(2,2,4,3,5)]),
                 stringsAsFactors=F)

matched <- sapply(1:nrow(df), function(i) grepl(df$A[i], df$B[i]))

df[matched, ]

Result

  A  B
2 b db
5 e ge

Upvotes: 1

CPak
CPak

Reputation: 13581

If you're only interested in by-row comparisons, you can use rowwise():

df <- data.frame(A=letters[1:5],
             B=paste0(letters[3:7],letters[c(2,2,4,3,5)]),
             stringsAsFactors=F)

df %>% 
   rowwise() %>% 
   filter(grepl(A,B))

       A      B
1      b     db
2      e     ge

---------------------------------------------------------------------------------
If you want to know whether row-entry of A is in all of B:

df %>% rowwise() %>% filter(any(grepl(A,df$B)))

      A     B
1     b    db
2     c    ed
3     d    fc
4     e    ge

Upvotes: 4

Related Questions