Reputation: 337
I'm looking for regex to remove all numbers before the first appearance of an underscore ( _ ).
Here's an example of a string I want to modify -
"123-abc-123_abc-123_abc_123_abc"
Here's the desired result -
"-abc-_abc-123_abc_123_abc"
I've tried a bunch of things. Positive lookaheads seem like they'd work. For instance I tried this -
str_replace_all("123-abc-123_abc-123_abc_123_abc", "[0-9]*(?=.*_)", "")
But that matches all numbers that have an underscore in front of them, not just the numbers before the first instance of an underscore.
Upvotes: 3
Views: 2153
Reputation: 163207
Another option could be to capture in group 1 matching from the first underscore to the end of the string or to match 1+ digits.
(_.*$)|\d+
In the replacement use the first capturing group.
s <- "123-abc-123_abc-123_abc_123_abc"
gsub("(_.*$)|\\d+", "\\1", s)
Result
[1] "-abc-_abc-123_abc_123_abc"
Upvotes: 1
Reputation: 626728
You may use
x <- "123-abc-123_abc-123_abc_123_abc"
gsub("\\G([^_\\d]*)\\d", "\\1", x, perl=TRUE)
See regex demo. The regex matches
\G
- start of string or end of the previous match([^_\d]*)
- Group 1 (its value is referred to with \1
placeholder from the replacement pattern): any 0+ chars other than a digit or _
\d
- a digit.Or, use
library(stringr)
x <- "123-abc-123_abc-123_abc_123_abc"
str_replace(x, "\\d[^_]*", function(m) { gsub("\\d", "", m) })
[1] "-abc-_abc-123_abc_123_abc"
The \d[^_]*
pattern will match a digit and all 0 or more chars other than _
after it, str_replace
will only handle the first occurrence, replacing the match with its copy from which all digits are removed by means of function(m) { gsub("\\d", "", m) }
.
See the R demo online
Upvotes: 3