Reputation: 343
In order for me to solve a tag migration problem, I have to compare between two character columns and assess whether there are coincidences between both columns or not.
To sum up, given a dataframe like this:
old_tags new_tags
burger burger, american
italian, pizza italian
latin, peruvian peruvian, latin
french pizza
I'd like to add a third column like this one:
old_tags new_tags match
burger burger, american TRUE
italian, pizza italian TRUE
latin, peruvian peruvian, latin TRUE
french pizza FALSE
Until now I've unsuccessfully tried with functions such as str_match
, str_detect
and so on. It usually returns me FALSE
when comparing pairs of strings that should be actually TRUE
such the example I've put in [3,]
.
Thanks a lot in advance.
Upvotes: 2
Views: 2089
Reputation: 886938
Or we can do str_extract
with any
library(tidyverse)
df %>%
mutate(match = map2_lgl(str_extract_all(old_tags, "\\w+"),
str_extract_all(new_tags, "\\w+"), ~ any(.x %in% .y)))
# old_tags new_tags match
#1 burger burger, american TRUE
#2 italian, pizza italian TRUE
#3 latin, peruvian peruvian, latin TRUE
#4 french pizza FALSE
df <- structure(list(old_tags = c("burger", "italian, pizza", "latin, peruvian",
"french"), new_tags = c("burger, american", "italian", "peruvian, latin",
"pizza")), row.names = c(NA, -4L), class = "data.frame")
Upvotes: 0
Reputation: 13309
A tidyverse
-base
possibility:
library(dplyr)
library(stringr)
df %>%
mutate(patterns = map_chr(strsplit(old_tags, ", "),paste,collapse="|"),
Match = str_detect(new_tags, patterns)) %>%
select(-patterns)
old_tags new_tags Match
1 burger burger, american TRUE
2 italian, pizza italian TRUE
3 latin, peruvian peruvian, latin TRUE
4 french pizza FALSE
Upvotes: 1
Reputation: 388817
One base R approach could be to split the string on comma. Using Map
find intersecting words and create a logical value if there is at least one value which intersects.
df$match <- lengths(Map(intersect, strsplit(df$old_tags, ", "),
strsplit(df$new_tags, ", "))) > 0
df
# old_tags new_tags match
#1 burger burger, american TRUE
#2 italian, pizza italian TRUE
#3 latin, peruvian peruvian, latin TRUE
#4 french pizza FALSE
data
df <- structure(list(old_tags = c("burger", "italian, pizza", "latin, peruvian",
"french"), new_tags = c("burger, american", "italian", "peruvian, latin",
"pizza")), row.names = c(NA, -4L), class = "data.frame")
Upvotes: 2