blackfury
blackfury

Reputation: 685

Regex for replacing specific numbers of a sentence

I have a sentence something like below

test_str = r'Mr.X has 23 apples and 59 oranges, his business partner from Colorado staying staying in hotel with phone number +188991234 and his wife and kids are staying away from him'

I would like to replace all digits in the above sentence with '0' and phone number should only have the first digit which is +1.

result = r'Mr.X has 00 apples and 00 oranges, his business partner from Colorado staying staying in hotel with phone number +1******** and his wife and kids are staying away from him'

I have the following regex to replace the phone number pattern (which always has a consistent number of digits).

result = re.sub(r'(.*)?(+1)(\d{8})', r'\1\2********', test_str)

Could i replace other digits with 0 except phone number in one single regex?

Upvotes: 0

Views: 206

Answers (2)

The fourth bird
The fourth bird

Reputation: 163277

If you want to keep the first 3 numbers of the phone number and keep the optional +1 using a single pattern:

(?<!\S)((?:\+1)?)(\d{3})(\d{5})(?!\S)|\d+

In parts

(?<!     Negative lookbehind
  \S     Match any char except a whitespace char
)        Close group
(        Capture group 1
  (?:    Non capture group
    \+   Match + char
    1    Match 1 char
  )?     Close group and repeat 0 or 1 times
)        Close group
(        Capture group 2
  \d{3}  Match a digit and repeat Match 3 times.
)        Close group
(        Capture group 3
  \d{5}  Match a digit and repeat Match 5 times.
)        Close group
(?!      Negative lookahead
  \S     Match any char except a whitespace char
)        Close group
|        Or
\d+      Match a digit and repeat 1 or more times

Regex demo | Python demo

Example code

import re

pattern = r"(?<!\S)((?:\+1)?)(\d{3})(\d{5})(?!\S)|\d+"

s = ("Mr.X has 23 apples and 59 oranges, his business partner from Colorado staying staying in hotel with phone number +188991234 and his wife and kids are staying away from him\n\n"
            "This is a tel 12345678 and this is 1234567 123456789")

result = re.sub(
    pattern,
    lambda x: x.group(1) + x.group(2) + "*" * len(x.group(3)) if x.group(2) else "0" * len(x.group()),
    s)
print(result)

Output

Mr.X has 00 apples and 00 oranges, his business partner from Colorado staying staying in hotel with phone number +1889***** and his wife and kids are staying away from him

This is a tel 123***** and this is 0000000 000000000

Upvotes: 0

Skycc
Skycc

Reputation: 3555

we could use re.sub with function

for replacing the phone number, could use regex below. all digits follow by +1 will be replace to the equivalant number of *

result = re.sub(r'(?<!\w)(\+1)(\d+)', lambda x:x.group(1) + '*'*len(x.group(2)), test_str)

for replacing other number to 0, can use regex below, all digits not precede with + or digit will be replace by equivalant number of 0

result = re.sub(r'(?<![\+\d])(\d+)', lambda x:'0'*len(x.group(1)), test_str)

example

>>> test_str = r'Mr.X has 23 apples and 59 oranges, his phone number +188991234'
>>> result = re.sub(r'(?<!\w)(\+1)(\d+)', lambda x:x.group(1) + '*'*len(x.group(2)), test_str)
>>> result = re.sub(r'(?<![\+\d])(\d+)', lambda x:'0'*len(x.group(1)), result)
>>> result
'Mr.X has 00 apples and 00 oranges, his phone number +1********'

addon for the follow up question in comment to retain 3 digits of number, we could just modify the 1st regex for the +1 portion, while 2nd regex remains the same

>>> test_str = r'Mr.X has 23 apples and 59 oranges, his phone number +188991234'
>>> result = re.sub(r'(?<!\w)(\+\d{3})(\d+)', lambda x:x.group(1) + '*'*len(x.group(2)), test_str)
>>> result = re.sub(r'(?<![\+\d])(\d+)', lambda x:'0'*len(x.group(1)), result)
>>> result
'Mr.X has 00 apples and 00 oranges, his phone number +188******'

Upvotes: 1

Related Questions