big parma
big parma

Reputation: 337

Regex to remove all numbers before a specific character (working in R)

I'm looking for regex to remove all numbers before the first appearance of an underscore ( _ ).

Here's an example of a string I want to modify -

"123-abc-123_abc-123_abc_123_abc"

Here's the desired result -

"-abc-_abc-123_abc_123_abc"

I've tried a bunch of things. Positive lookaheads seem like they'd work. For instance I tried this -

str_replace_all("123-abc-123_abc-123_abc_123_abc", "[0-9]*(?=.*_)", "")

But that matches all numbers that have an underscore in front of them, not just the numbers before the first instance of an underscore.

Upvotes: 3

Views: 2153

Answers (2)

The fourth bird
The fourth bird

Reputation: 163207

Another option could be to capture in group 1 matching from the first underscore to the end of the string or to match 1+ digits.

(_.*$)|\d+

Regex demo | R demo

In the replacement use the first capturing group.

s <- "123-abc-123_abc-123_abc_123_abc"
gsub("(_.*$)|\\d+", "\\1", s)

Result

[1] "-abc-_abc-123_abc_123_abc"

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626728

You may use

x <- "123-abc-123_abc-123_abc_123_abc"
gsub("\\G([^_\\d]*)\\d", "\\1", x, perl=TRUE)

See regex demo. The regex matches

  • \G - start of string or end of the previous match
  • ([^_\d]*) - Group 1 (its value is referred to with \1 placeholder from the replacement pattern): any 0+ chars other than a digit or _
  • \d - a digit.

Or, use

library(stringr)
x <- "123-abc-123_abc-123_abc_123_abc"
str_replace(x, "\\d[^_]*", function(m) { gsub("\\d", "", m) })
[1] "-abc-_abc-123_abc_123_abc"

The \d[^_]* pattern will match a digit and all 0 or more chars other than _ after it, str_replace will only handle the first occurrence, replacing the match with its copy from which all digits are removed by means of function(m) { gsub("\\d", "", m) }.

See the R demo online

Upvotes: 3

Related Questions