Reputation:
I have a string that contains a persons name and city. It's formatted like this:
mock <- "Joe Smith (Cleveland, OH)"
I simply want the state abbreviation remaining, so it in this case, the only remaining string would be "OH"
I can get rid of the the parentheses and comma
[(.*?),]
Which gives me:
"Joe Smith Cleveland OH"
But I can't figure out how to combine all of it. For the record, all of the records will look like that, where it ends with ", two letter capital state abbreviation" (ex: ", OH", ", KY", ", MD" etc...)
Upvotes: 1
Views: 68
Reputation: 269624
If the general case is that the state is in the second and third last characters then match everything, .*
, and then a capture group of two characters (..)
and then another character .
and replace that with the capture group:
sub(".*(..).", "\\1", mock)
## [1] "OH"
Upvotes: 0
Reputation: 119
How about this. If they are all formatted the same, then this should work.
mock <- "Joe Smith (Cleveland, OH)"
substr(mock, (nchar(mock) - 2), (nchar(mock) - 1))
Upvotes: 1
Reputation: 626845
You may use
mock <- "Joe Smith (Cleveland, OH)"
sub(".+,\\s*([A-Z]{2})\\)$","\\1",mock)
## => [1] "OH"
## With stringr:
str_extract(mock, "[A-Z]{2}(?=\\)$)")
See this R demo
Details
.+,\\s*([A-Z]{2})\\)$
- matches any 1+ chars as many as possible, then ,
, 0+ whitespaces, and then captures 2 uppercase ASCII letters into Group 1 (referred to with \1
from the replacement pattern) and then matches )
at the end of string[A-Z]{2}(?=\)$)
- matches 2 uppercase ASCII letters if followed with the )
at the end of the string.Upvotes: 1