R - regular expression, match after second or third occurence

Question

I have a vector of character strings:
x <- c( "\nFolsom Field, University of Colorado, Boulder, CO (9/3/72)", "\nHollywood Palladium, Hollywood, CA (9/9/72)" )

And I want to extract event location, city, state, and date. I have figured out the event location, city, and date, but cannot correctly match the state -- This issue I am having is that I need to match after the second or the third comma and before the first parentheses.

I tried: stateLoc <- regexpr(",{2,}.+?\(", x) state <- regmatches(x, stateLoc) but that returned an empty character vector.

Any input is appreciated, thank you.

Wiktor Stribiżew · Accepted Answer

You may extract these details using a single str_match call:

library(stringr)
x <- c("\nFolsom Field, University of Colorado, Boulder, CO (9/3/72)","\nHollywood Palladium, Hollywood, CA (9/9/72)")
> res <- str_match(x, "\s*([^,]*),\s*([A-Z]+)\s*$([0-9/]+)$")
> res[,2]
[1] "Boulder"   "Hollywood"
> res[,3]
[1] "CO" "CA"
> res[,4]
[1] "9/3/72" "9/9/72"

See the regex demo online.

Details

\s* - 0+ whitespaces
([^,]*) - Capturing group 1: any 0 or more chars other than a comma
, - a comma
\s* - 0+ whitespaces
([A-Z]+) - Capturing group 2: 1 or more uppercase letters
\s* - 0+ whitespaces
$ - a ( char
([0-9/]+) - Capturing group 3: 1 or more digits or slashes
$ - a ) char.

R - regular expression, match after second or third occurence

Answers (2)

Related Questions