Sebastian Hesse
Sebastian Hesse

Reputation: 545

Compare one col with another containing multiple entries

We need to compare one col with another in a df and identify if the entry from col"a" matches any of the entries in col"b". The result would be a new col with T = match or F = no match.

# task df
df <- data.frame(
  a = c("ABC", 'ABB', 'ACC', 'AAG'),
  b = c("XXC TTZ", "XCT ABB", "TTG WHO ACC", 'AAG')
)

# expected result
df <- data.frame(
  a = c("ABC", 'ABB', 'ACC', 'AAG'),
  b = c("XXC", "XCT ABB", "TTG WHO ACC", 'AAG'),
  match = c("F", "T", "T", "T")
)

I just come out of one year clinical rotation so my coding got a bit rusty. Could not find an answer here, excuse the hustle if this has been asked before. I guess the solution is rather straight forward but I can't wrap my head around it. Thanks a lot for helping (dplyr solutions much appreciated).

Upvotes: 1

Views: 40

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 101403

A base R option

transform(
  df,
  match = mapply(grepl, a, b, USE.NAMES = FALSE)
)

gives

    a           b match
1 ABC     XXC TTZ FALSE
2 ABB     XCT ABB  TRUE
3 ACC TTG WHO ACC  TRUE
4 AAG         AAG  TRUE

Upvotes: 1

akrun
akrun

Reputation: 887138

Use str_detect from stringr which is vectorized for both string and pattern

library(stringr)
library(dplyr)
df %>% 
   mutate(match = str_detect(b, a))
    a           b match
1 ABC         XXC FALSE
2 ABB     XCT ABB  TRUE
3 ACC TTG WHO ACC  TRUE
4 AAG         AAG  TRUE

Upvotes: 1

Related Questions