Reputation: 2040
I am trying to extract the first four digits after a hyphen in the following string: extract_public_2018_20190530180949469_58906_20110101-20111231Texas
. I am using the following code:
stringr::str_extract(
"extract_public_2018_20190530180949469_58906_20110101-20111231Texas",
"-[[:digit:]]{4}"
)
But I get -2011
instead of 2011
. How can I only extract the four digits and not the hyphen?
Upvotes: 5
Views: 1139
Reputation: 388982
In base R, we can sub
to extract 4 digits after hyphen.
string <- "extract_public_2018_20190530180949469_58906_20110101-20111231Texas"
sub(".*-(\\d{4}).*", "\\1", string)
#[1] "2011"
Upvotes: 2
Reputation: 33782
str_extract
is behaving as expected i.e. it returns the complete match.
You can use str_match
and include ()
in the pattern:
stringr::str_match(
"extract_public_2018_20190530180949469_58906_20110101-20111231Texas",
"-([[:digit:]]{4})"
)
[,1] [,2]
[1,] "-2011" "2011"
Then add [, 2]
to return just the match:
stringr::str_match(
"extract_public_2018_20190530180949469_58906_20110101-20111231Texas",
"-([[:digit:]]{4})"
)[, 2]
[1] "2011"
Upvotes: 2
Reputation: 160447
Use regex's lookbehind, a non-greedy way of finding something before your pattern without consuming it:
stringr::str_extract(
"extract_public_2018_20190530180949469_58906_20110101-20111231Texas",
"(?<=-)[[:digit:]]{4}"
)
# [1] "2011"
Upvotes: 5