JarochoEngineer
JarochoEngineer

Reputation: 1787

Regex Python: Keep first digits

The objective is to keep the first digits in a string, but remove them if they are in different place.

For instance, just this numbers should be kept:

123456 AB
123456 GENERAL
123456 HOSPITAL

On the other hand, these numbers should be removed:

PROJECT 150000 SCHOLARSHIPS
SUMMERLAND 05 100 SCHOOL 100 ABC
ABC HOSPITAL 01 20 30 GENERAL
ABC HOSPITAL 01

I have crafted this regex which is very near to the mentioned behaviour and substituting for empty space:

(?<=\w\b )([0-9]*)

However, I am getting some an additional space when removing the digits which is coming from the preceding space:

123456 AB
123456 GENERAL
123456 HOSPITAL

PROJECT  SCHOLARSHIPS
SUMMERLAND   SCHOOL  ABC
ABC HOSPITAL    GENERAL
ABC HOSPITAL 

How can I get rid of this space?

Upvotes: 0

Views: 164

Answers (2)

The fourth bird
The fourth bird

Reputation: 163362

To keep the first digits in the string, you could also use a capturing group with an alternation instead of a lookbehind. Capture in a group what you want to keep, and match what you don't want to keep.

^([^\S\r\n]*\d+)|\d+[^\S\r\n]*
  • ^ Start of string
  • ( Capture group 1 (what you want to keep)
    • [^\S\r\n]*\d+ Match optional whitespace chars except newlines, match 1+ digits
  • ) Close group
  • | Or
  • \d+[^\S\r\n]* Match 1+ digits followed by optional whitespace chars except newlines (What you want to remove)

Regex demo | Python demo

For example

result = re.sub(regex, r'\1', test_str, 0, re.MULTILINE)

Output

123456 AB
123456 GENERAL
123456 HOSPITAL

PROJECT SCHOLARSHIPS
SUMMERLAND SCHOOL ABC
ABC HOSPITAL GENERAL
ABC HOSPITAL 

Upvotes: 1

wpercy
wpercy

Reputation: 10090

You should be able to just include the space in the capturing group like this:

(?<=\w\b )([ 0-9]*)
            ^ additional space

Upvotes: 3

Related Questions