Reputation: 13
I have been trying to get this right. What I want to do is extract a year from a string. The string looks like this for example:
Toy Story (1995)
Or it could look like this
Twelve Monkeys (a.k.a. 12 Monkeys) (1995)
To extract the numbers, I currently use
year = gsub("(?<=\\()[^()]*(?=\\))(*SKIP)(*F)|.", "", x, perl=T)
Now, this would work in most cases, where the first one is used, but in the list the second one is also used.
[1] 1995
[2] a.k.a. 12 Monkeys1995
So obviously I do not want the string but only the year, how do I get this?
Upvotes: 1
Views: 3668
Reputation: 12937
If the years are always located at the end of each string circled by parentheses, you could do this in base R:
as.numeric(gsub("\\(|\\)", "", substr(x, nchar(x)-5,nchar(x))))
#[1] 1995 1995
Use trimws(x)
beforehand in case there might be any head or tail spaces.
Upvotes: 0
Reputation: 886948
We can use
library(stringr)
as.numeric(str_extract(x, "(?<=\\()[0-9]+(?=\\))"))
#[1] 1995 1995
x <- c("Toy Story (1995)", "Twelve Monkeys (a.k.a. 12 Monkeys) (1995)")
Upvotes: 4
Reputation: 78792
stringi::stri_match_last_regex(x, "\\(([[:digit:]]+)\\)")[,2]
Escaping the parens is still a pain, but it's a far more readable regex IMO.
Upvotes: 2