user9302275
user9302275

Reputation:

Removing parentheses, text proceeding comma, and the comma in a string using string

I have a string that contains a persons name and city. It's formatted like this:

mock <- "Joe Smith (Cleveland, OH)"

I simply want the state abbreviation remaining, so it in this case, the only remaining string would be "OH"

I can get rid of the the parentheses and comma

[(.*?),]

Which gives me:

"Joe Smith Cleveland OH"

But I can't figure out how to combine all of it. For the record, all of the records will look like that, where it ends with ", two letter capital state abbreviation" (ex: ", OH", ", KY", ", MD" etc...)

Upvotes: 1

Views: 68

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269624

If the general case is that the state is in the second and third last characters then match everything, .*, and then a capture group of two characters (..) and then another character . and replace that with the capture group:

sub(".*(..).", "\\1", mock)
## [1] "OH"

Upvotes: 0

Brad
Brad

Reputation: 119

How about this. If they are all formatted the same, then this should work.

mock <- "Joe Smith (Cleveland, OH)"
substr(mock, (nchar(mock) - 2), (nchar(mock) - 1))

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

You may use

mock <- "Joe Smith (Cleveland, OH)"
sub(".+,\\s*([A-Z]{2})\\)$","\\1",mock)
## => [1] "OH"
## With stringr:
str_extract(mock, "[A-Z]{2}(?=\\)$)")

See this R demo

Details

  • .+,\\s*([A-Z]{2})\\)$ - matches any 1+ chars as many as possible, then ,, 0+ whitespaces, and then captures 2 uppercase ASCII letters into Group 1 (referred to with \1 from the replacement pattern) and then matches ) at the end of string
  • [A-Z]{2}(?=\)$) - matches 2 uppercase ASCII letters if followed with the ) at the end of the string.

Upvotes: 1

Related Questions