R User
R User

Reputation: 13

R Extract number from string

I have been trying to get this right. What I want to do is extract a year from a string. The string looks like this for example:

Toy Story (1995)

Or it could look like this

Twelve Monkeys (a.k.a. 12 Monkeys) (1995)

To extract the numbers, I currently use

year = gsub("(?<=\\()[^()]*(?=\\))(*SKIP)(*F)|.", "", x, perl=T)

Now, this would work in most cases, where the first one is used, but in the list the second one is also used.

[1] 1995
[2] a.k.a. 12 Monkeys1995

So obviously I do not want the string but only the year, how do I get this?

Upvotes: 1

Views: 3668

Answers (3)

989
989

Reputation: 12937

If the years are always located at the end of each string circled by parentheses, you could do this in base R:

as.numeric(gsub("\\(|\\)", "", substr(x, nchar(x)-5,nchar(x))))
#[1] 1995 1995

Use trimws(x) beforehand in case there might be any head or tail spaces.

Upvotes: 0

akrun
akrun

Reputation: 886948

We can use

library(stringr)
as.numeric(str_extract(x, "(?<=\\()[0-9]+(?=\\))"))
#[1] 1995 1995

data

x <-  c("Toy Story (1995)", "Twelve Monkeys (a.k.a. 12 Monkeys) (1995)")

Upvotes: 4

hrbrmstr
hrbrmstr

Reputation: 78792

stringi::stri_match_last_regex(x, "\\(([[:digit:]]+)\\)")[,2]

Escaping the parens is still a pain, but it's a far more readable regex IMO.

Upvotes: 2

Related Questions