Jenn
Jenn

Reputation: 13

regular expression for recurring digits in python

I am working on to replace all digits in my data source with "numbr". Some examples are

  1. 1234-546-234235-1232-1242-123124 -> numbr
  2. 125436 -> numbr
  3. abc1231241 -> abcnumbr

I have tried using re.sub(r'(\d+[/-]*\d+)(R?)', "numbr", token) but it is not doing replacement for example 1 properly. Any ideas of what I am missing ?

Upvotes: 0

Views: 66

Answers (1)

ctwheels
ctwheels

Reputation: 22837

Code

See regex in use here

(?:\d-\d|\d)+

Another alternative (?:\d(?:-\d)?)+ can be used, but it takes 1 extra step to complete.


Results

Input

1234-546-234235-1232-1242-123124
125436
abc1231241

Output

numbr
numbr
abcnumbr

Explanation

  • (?:\d-\d|\d)+ Match either of the following one or more times
    • \d-\d Match a digit, followed by a hyphen -, followed by a digit
    • \d Match a digit

The reason to use (?:\d-\d|\d)+ instead of [\d-]+ is so that we don't accidentally replace valid hyphenated words such that something like my-name becomes mynumbrname or abc-1234 doesn't become abcnumbr, but instead abc-numbr

Upvotes: 5

Related Questions