Reputation: 85
I have a vector of character strings:
x <- c(
"\nFolsom Field, University of Colorado, Boulder, CO (9/3/72)",
"\nHollywood Palladium, Hollywood, CA (9/9/72)"
)
And I want to extract event location, city, state, and date. I have figured out the event location, city, and date, but cannot correctly match the state -- This issue I am having is that I need to match after the second or the third comma and before the first parentheses.
I tried:
stateLoc <- regexpr(",{2,}.+?\\(", x)
state <- regmatches(x, stateLoc)
but that returned an empty character vector.
Any input is appreciated, thank you.
Upvotes: 1
Views: 1616
Reputation: 626738
You may extract these details using a single str_match
call:
library(stringr)
x <- c("\nFolsom Field, University of Colorado, Boulder, CO (9/3/72)","\nHollywood Palladium, Hollywood, CA (9/9/72)")
> res <- str_match(x, "\\s*([^,]*),\\s*([A-Z]+)\\s*\\(([0-9/]+)\\)")
> res[,2]
[1] "Boulder" "Hollywood"
> res[,3]
[1] "CO" "CA"
> res[,4]
[1] "9/3/72" "9/9/72"
See the regex demo online.
Details
\\s*
- 0+ whitespaces([^,]*)
- Capturing group 1: any 0 or more chars other than a comma,
- a comma\\s*
- 0+ whitespaces([A-Z]+)
- Capturing group 2: 1 or more uppercase letters\\s*
- 0+ whitespaces\\(
- a (
char([0-9/]+)
- Capturing group 3: 1 or more digits or slashes\\)
- a )
char.Upvotes: 1
Reputation: 3060
This regex worked for me
library(stringr)
x <- c(
"\nFolsom Field, University of Colorado, Boulder, CO (9/3/72)",
"\nHollywood Palladium, Hollywood, CA (9/9/72)",
"\nThe Spectrum, Philadelphia, PA (5/1/2010) "
)
##String trim is just to cut trailing spaces
states <- str_trim(str_extract(x, "\\s[A-Z]{1,2}\\s"))
states
Upvotes: 1