Reputation: 1787
The objective is to keep the first digits in a string, but remove them if they are in different place.
For instance, just this numbers should be kept:
123456 AB
123456 GENERAL
123456 HOSPITAL
On the other hand, these numbers should be removed:
PROJECT 150000 SCHOLARSHIPS
SUMMERLAND 05 100 SCHOOL 100 ABC
ABC HOSPITAL 01 20 30 GENERAL
ABC HOSPITAL 01
I have crafted this regex which is very near to the mentioned behaviour and substituting for empty space:
(?<=\w\b )([0-9]*)
However, I am getting some an additional space when removing the digits which is coming from the preceding space:
123456 AB
123456 GENERAL
123456 HOSPITAL
PROJECT SCHOLARSHIPS
SUMMERLAND SCHOOL ABC
ABC HOSPITAL GENERAL
ABC HOSPITAL
How can I get rid of this space?
Upvotes: 0
Views: 164
Reputation: 163362
To keep the first digits in the string, you could also use a capturing group with an alternation instead of a lookbehind. Capture in a group what you want to keep, and match what you don't want to keep.
^([^\S\r\n]*\d+)|\d+[^\S\r\n]*
^
Start of string(
Capture group 1 (what you want to keep)
[^\S\r\n]*\d+
Match optional whitespace chars except newlines, match 1+ digits)
Close group|
Or\d+[^\S\r\n]*
Match 1+ digits followed by optional whitespace chars except newlines (What you want to remove)For example
result = re.sub(regex, r'\1', test_str, 0, re.MULTILINE)
Output
123456 AB
123456 GENERAL
123456 HOSPITAL
PROJECT SCHOLARSHIPS
SUMMERLAND SCHOOL ABC
ABC HOSPITAL GENERAL
ABC HOSPITAL
Upvotes: 1
Reputation: 10090
You should be able to just include the space in the capturing group like this:
(?<=\w\b )([ 0-9]*)
^ additional space
Upvotes: 3