Reputation: 4040
I have a lot of strings like this:
2019/01/01/07/556662_cba3a4fc-cb8f-4150-859f-5f21a38373d0
I want to extract the substring that lays right after the last "/"
and ends with "_"
:
556662
I have found out how to extract: /01/01/07/556662
by using the following regex: (\/)(.*?)(?=\_)
Please advise how can I capture the right group.
Upvotes: 3
Views: 7052
Reputation: 163362
You could use a capturing group:
/([^_/]+)_[^/\s]*
Explanation
/
Match literally([^_/]+)
Capture in a group matching not an underscore or forward slash_[^/\s]*
Match _
and then 0+ times not a forward slash or a whitespace characterOne option to get the capturing group might be to get the second column using str_match:
library(stringr)
str = c("2019/01/01/07/556662_cba3a4fc-cb8f-4150-859f-5f21a38373d0")
str_match(str, "/([^_/]+)_[^/\\s]*")[,2]
# [1] "556662"
Upvotes: 5
Reputation: 90
I changed the Regex rules according to the code of Wiktor Stribiżew.
x <- "2019/01/01/07/556662_cba3a4fc-cb8f-4150-859f-5f21a38373d0"
regmatches(x, regexpr(".*/([0-9]+)", x, perl=TRUE))
sub(".*/([0-9]+).*", "\\1", x)
[1] "2019/01/01/07/556662"
[1] "556662"
Upvotes: 0
Reputation: 626870
You may use
x <- "2019/01/01/07/556662_cba3a4fc-cb8f-4150-859f-5f21a38373d0"
regmatches(x, regexpr(".*/\\K[^_]+", x, perl=TRUE))
## [1] "556662"
Here, the regex matches and outputs the first substring that matches
.*/
- any 0+ chars as many as possible up to the last /
\K
- omits this part from the match [^_]+
- puts 1 or more chars other than _
into the match value.Or, a sub
solution:
sub(".*/([^_]+).*", "\\1", x)
See the regex demo.
Here, it is similar to the previous one, but the 1 or more chars other than _
are captured into Group 1 (\1
in the replacement pattern) and the trailing .*
make sure the whole input is matched (and consumed, ready to be replaced).
Alternative non-base R solutions
If you can afford or prefer to work with stringi
, you may use
library(stringi)
stri_match_last_regex("2019/01/01/07/556662_cba3a4fc-cb8f-4150-859f-5f21a38373d0", ".*/([^_]+)")[,2]
## [1] "556662"
This will match a string up to the last /
and will capture into Group 1 (that you access in Column 2 using [,2]
) 1 or more chars other than _
.
Or
stri_extract_last_regex("2019/01/01/07/556662_cba3a4fc-cb8f-4150-859f-5f21a38373d0", "(?<=/)[^_/]+")
## => [1] "556662"
This will extract the last match of a string that consists of 1 or more chars other than _
and /
after a /
.
Upvotes: 5