Catherine Laing
Catherine Laing

Reputation: 657

Create new column to show partial matches across strings in dplyr

I'm trying to create a new column to show whether there is any match across strings in two columns in my dataframe. This question is almost what I'm asking, but rather than filtering, I want to create a new column to show whether there is a match or not (TRUE or FALSE).

So here is an example dataframe:

 transcript        target
 he saw the dog    saw
 she gave them it  gave
 watch out for     danger
 real bravery      brave

And I want to create a new column showing any match between the two:

 transcript        target    match
 he saw the dog    saw        T
 she gave them it  gave       T
 watch out for     danger     F
 real bravery      brave      T

I would prefer to use dplyr() but am open to other suggestions!

Upvotes: 3

Views: 1002

Answers (3)

MKR
MKR

Reputation: 20095

An option can be to use dplyr::rowwise() along with grepl to create the match column as:

library(dplyr)

df %>% rowwise() %>%
  mutate(match  = grepl(target,transcript)) %>%
  as.data.frame()

#         transcript target match
# 1   he saw the dog    saw  TRUE
# 2 she gave them it   gave  TRUE
# 3    watch out for danger FALSE
# 4     real bravery  brave  TRUE

Data:

df <- read.table(text = 
"transcript        target
'he saw the dog'    saw
'she gave them it'  gave
'watch out for'     danger
'real bravery'      brave",
header = TRUE, stringsAsFactors = FALSE)

Upvotes: 1

phiver
phiver

Reputation: 23598

You asked for a dplyr method, but here is also a base R method using grepl:

df1$match <- mapply(grepl, df1$target, df1$transcript)

df1
        transcript target match
1   he saw the dog    saw  TRUE
2 she gave them it   gave  TRUE
3    watch out for danger FALSE
4     real bravery  brave  TRUE

using grepl inside a dplyr mutate statement:

df1 %>% 
  mutate(match = mapply(grepl, target, transcript))

        transcript target match
1   he saw the dog    saw  TRUE
2 she gave them it   gave  TRUE
3    watch out for danger FALSE
4     real bravery  brave  TRUE

Upvotes: 2

A. Suliman
A. Suliman

Reputation: 13125

Using stringr::str_detect we can check if transcript contains target

library(stringr)
library(dplyr)
df %>% mutate_if(is.factor, as.character) %>%    #If transcript and target are character class  in your df then no need to this step
       mutate(match = str_detect(transcript,target))


         transcript target match
1   he saw the dog    saw  TRUE
2 she gave them it   gave  TRUE
3    watch out for danger FALSE
4     real bravery  brave  TRUE

Upvotes: 3

Related Questions