Reputation: 660
I have a data frame with a concatenated string, the last 11 digits of which are a census tract. I have a separate list of strings, in which the last 2 or 5 digits represent states or counties, respectively. I've concatenated a *
on the end of the 2 or 5 digit id. I need to go through the data frame and flag whether the trans
variable (census tract) is in the patterns
vector (state or county), allowing the *
to represent the remaining 9 or 6 digits in trans
.
As shown in the code below, I've gotten this to work by collapsing all the pattern
s into a single string with collapse="|"
, and grepl
ing the two. However, I'm wondering if I can accomplish this through a vector operation because 1) it feels like I should be able to, and 2) in practice, the list of patterns is enormous and it feels foolish to put them into a single character variable.
Is there anything similar to the %in%
operator, but with regex/wildcard character support?
library(dplyr)
trans <- c("1-IA-45045000100",
"2-IA-23003001801",
"3-LITP-01001000100",
"4-OTP-06006000606",
"4-OTP-06010001001",
"1-IA-45001010002",
"2-IA-45045000101",
"2-LITP-23005005002")
df <- data.frame(id = 1:8, trans)
patterns <- c("1-IA-45*",
"2-LITP-23005*",
"4-OTP-06*")
# This works, but I'm looking for a better way
patterns_string <- paste(patterns, collapse="|")
df <- df %>% mutate(match = ifelse(grepl(patterns_string, df$trans), TRUE, FALSE))
# Is there anyway to keep the patterns in a vector and check for whether they
# any of them grepl with each row or my data frame or to use %in% with a
# wildcard character?
# "argument 'pattern' has length > 1 and only first element will be used"
df <- df %>% mutate(match = ifelse(grepl(patterns, df$trans), TRUE, FALSE))
# Can't take advantage of the 'wild character '*'
df <- df %>% mutate(match = trans %in% patterns)
Upvotes: 0
Views: 1718
Reputation: 99331
You could run each pattern through grepl()
via lapply()
, then use Reduce()
with the logical "or" operator |
to combine the results.
df$match <- Reduce("|", lapply(patterns, grepl, df$trans))
df
# id trans match
# 1 1 1-IA-45045000100 TRUE
# 2 2 2-IA-23003001801 FALSE
# 3 3 3-LITP-01001000100 FALSE
# 4 4 4-OTP-06006000606 TRUE
# 5 5 4-OTP-06010001001 TRUE
# 6 6 1-IA-45001010002 TRUE
# 7 7 2-IA-45045000101 FALSE
# 8 8 2-LITP-23005005002 TRUE
Upvotes: 4
Reputation: 887108
Here is an option using tidyverse
with stri_detect
from stringi
library(stringi)
library(tidyverse)
patterns %>%
map(~stri_detect_regex(df$trans, .)) %>%
reduce(`|`) %>%
mutate(df, match = .)
# id trans match
#1 1 1-IA-45045000100 TRUE
#2 2 2-IA-23003001801 FALSE
#3 3 3-LITP-01001000100 FALSE
#4 4 4-OTP-06006000606 TRUE
#5 5 4-OTP-06010001001 TRUE
#6 6 1-IA-45001010002 TRUE
#7 7 2-IA-45045000101 FALSE
#8 8 2-LITP-23005005002 TRUE
Upvotes: 1