Reputation: 845
I am trying to match the county name of a state in a string.
strings <- c("High School Graduate or Higher (5-year estimate) in Jefferson Parish, LA"
,"High School Graduate or Higher (5-year estimate) in Jefferson Davis Parish, LA")
countyName <- "Jefferson"
stateAbb <- "LA"
test <- gregexpr(paste0(countyName," (\\w), ",stateAbb,"$"),strings,ignore.case=T,perl=T)
I cannot get test
to actually return anything.
The code works if I replace \\w
with .*
but then "Jefferson" will also match lines with "Jefferson Davis".
Of course, when the county Name is actually "Jefferson Davis", I want to match "Jefferson Davis"
Upvotes: 1
Views: 65
Reputation: 627600
Your current regex only match a single "word" char (that is, a letter, digit or _
symbol) after the countyName. To make it match 1 or more "word" chars, add a +
quantifier to \w
:
test <- gregexpr(paste0(countyName," (\\w+), ",stateAbb,"$"),strings,ignore.case=T,perl=T)
^
The resulting regex will look like
Jefferson (\w+), LA$
See the regex demo
Details:
Jefferson
- a literal substring
- a space(\w+)
- a capturing group (perhaps, you do not even need it, remove (
and )
if you do not need to access this submatch) matching 1 or more letters, digits or _
symbols,
- a comma and then a sapceLA
- a literal substring$
- end of string.Upvotes: 1