Rahul Agarwal
Rahul Agarwal

Reputation: 4100

Capture Number after a phrase

I have strings like:

  1. Your signing bonus is 123,000
  2. This year signing bonus is bad. the signing bonus for this year is EUR 123,000
  3. The bonus is 14,456, but signing bonus.

I want the output like:

a) If there is any number followed by signing bonus keep that part of string and remove everything. See Expected Output 1 & 2

b) If no number is followed by signing bonus I should get the 1st part of the sting. See expected output 3

Expected Output

  1. is 123,000

  2. for this year is EUR 123,000

  3. The bonus is 14,456, but

My Regex:

match1 = re.findall(r'(?<=\bSigning Bonus\b)\s*(?:\S+\b\s*){0,8}',value, re.I|re.M|re.DOTALL)

It handles Output 1 and Output 2 but can't handle Output 3.

I am also open to solution which can be done without regex also!!

Upvotes: 0

Views: 67

Answers (3)

Sabareesh
Sabareesh

Reputation: 751

This will print your answer:

statements = [
    'Your signing bonus is 123,000',
    'This year signing bonus is bad. the signing bonus for this year is EUR 123,000',
    'The bonus is 14,456, but signing bonus.',
]
for statement in statements:
    ans = statement.split('signing bonus')
    if not ans:
        print('')
        continue
    for i in range(len(ans) - 1, -1, -1):
        for word in ans[i].split(' '):
            try:
                number = int(word.replace(',', ''))
                print(ans[i].strip())
                break
            except:
                pass

Output:

is 123,000
for this year is EUR 123,000
The bonus is 14,456, but

Upvotes: 0

Amit Nanaware
Amit Nanaware

Reputation: 3346

try below code.

s1 = "Your signing bonus is 123,000"
s2 = "This year signing bonus is bad. the signing bonus for this year is EUR 123,000"
s3 = "The bonus is 14,456, but signing bonus."
regex = '[0-9]'
import re
def format_string(s):
    for subs in s.split('signing bonus'):
        if re.findall(regex, subs):
            print subs.strip()

format_string(s1)
format_string(s2)
format_string(s3)

output is :

is 123,000
for this year is EUR 123,000
The bonus is 14,456, but

Upvotes: 4

Pushpesh Kumar Rajwanshi
Pushpesh Kumar Rajwanshi

Reputation: 18357

If you are okay using re.sub then you can use this regex to replace matched text with empty string,

^[^\d\n]*signing bonus\s*|\s*signing bonus[^\d\n]*$

In first two cases, you are intending to capture the string after signing bonus but in third case, your intended string is before signing bonus, hence for that you need another regex using alternation.

Regex Demo

Python code,

import re

arr = ['Your signing bonus is 123,000','This year signing bonus is bad. the signing bonus for this year is EUR 123,000','The bonus is 14,456, but signing bonus.']

for s in arr:
 print(s, '-->', re.sub(r'^[^\d\n]*signing bonus\s*|\s*signing bonus[^\d\n]*$', '', s))

Prints,

our signing bonus is 123,000 --> is 123,000
This year signing bonus is bad. the signing bonus for this year is EUR 123,000 --> for this year is EUR 123,000
The bonus is 14,456, but signing bonus. --> The bonus is 14,456, but

Upvotes: 2

Related Questions