Ashirwad
Ashirwad

Reputation: 2040

Extracting first four digits after a hyphen using stringr

I am trying to extract the first four digits after a hyphen in the following string: extract_public_2018_20190530180949469_58906_20110101-20111231Texas. I am using the following code:

stringr::str_extract(
"extract_public_2018_20190530180949469_58906_20110101-20111231Texas", 
"-[[:digit:]]{4}"
)

But I get -2011 instead of 2011. How can I only extract the four digits and not the hyphen?

Upvotes: 5

Views: 1139

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 388982

In base R, we can sub to extract 4 digits after hyphen.

string <- "extract_public_2018_20190530180949469_58906_20110101-20111231Texas"
sub(".*-(\\d{4}).*", "\\1", string)
#[1] "2011"

Upvotes: 2

neilfws
neilfws

Reputation: 33782

str_extract is behaving as expected i.e. it returns the complete match.

You can use str_match and include () in the pattern:

stringr::str_match(
  "extract_public_2018_20190530180949469_58906_20110101-20111231Texas", 
  "-([[:digit:]]{4})"
)

     [,1]    [,2]  
[1,] "-2011" "2011"

Then add [, 2] to return just the match:

stringr::str_match(
  "extract_public_2018_20190530180949469_58906_20110101-20111231Texas", 
  "-([[:digit:]]{4})"
)[, 2]

[1] "2011"

Upvotes: 2

r2evans
r2evans

Reputation: 160447

Use regex's lookbehind, a non-greedy way of finding something before your pattern without consuming it:

stringr::str_extract(
  "extract_public_2018_20190530180949469_58906_20110101-20111231Texas", 
  "(?<=-)[[:digit:]]{4}"
)
# [1] "2011"

Upvotes: 5

Related Questions