Reputation: 4100
I have strings like:
- Your signing bonus is 123,000
- This year signing bonus is bad. the signing bonus for this year is EUR 123,000
- The bonus is 14,456, but signing bonus.
I want the output like:
a) If there is any number followed by signing bonus
keep that part of string and remove everything. See Expected Output 1 & 2
b) If no number is followed by signing bonus
I should get the 1st part of the sting. See expected output 3
Expected Output
is 123,000
for this year is EUR 123,000
The bonus is 14,456, but
My Regex:
match1 = re.findall(r'(?<=\bSigning Bonus\b)\s*(?:\S+\b\s*){0,8}',value, re.I|re.M|re.DOTALL)
It handles Output 1 and Output 2 but can't handle Output 3.
I am also open to solution which can be done without regex also!!
Upvotes: 0
Views: 67
Reputation: 751
This will print your answer:
statements = [
'Your signing bonus is 123,000',
'This year signing bonus is bad. the signing bonus for this year is EUR 123,000',
'The bonus is 14,456, but signing bonus.',
]
for statement in statements:
ans = statement.split('signing bonus')
if not ans:
print('')
continue
for i in range(len(ans) - 1, -1, -1):
for word in ans[i].split(' '):
try:
number = int(word.replace(',', ''))
print(ans[i].strip())
break
except:
pass
Output:
is 123,000
for this year is EUR 123,000
The bonus is 14,456, but
Upvotes: 0
Reputation: 3346
try below code.
s1 = "Your signing bonus is 123,000"
s2 = "This year signing bonus is bad. the signing bonus for this year is EUR 123,000"
s3 = "The bonus is 14,456, but signing bonus."
regex = '[0-9]'
import re
def format_string(s):
for subs in s.split('signing bonus'):
if re.findall(regex, subs):
print subs.strip()
format_string(s1)
format_string(s2)
format_string(s3)
output is :
is 123,000
for this year is EUR 123,000
The bonus is 14,456, but
Upvotes: 4
Reputation: 18357
If you are okay using re.sub
then you can use this regex to replace matched text with empty string,
^[^\d\n]*signing bonus\s*|\s*signing bonus[^\d\n]*$
In first two cases, you are intending to capture the string after signing bonus
but in third case, your intended string is before signing bonus
, hence for that you need another regex using alternation.
Python code,
import re
arr = ['Your signing bonus is 123,000','This year signing bonus is bad. the signing bonus for this year is EUR 123,000','The bonus is 14,456, but signing bonus.']
for s in arr:
print(s, '-->', re.sub(r'^[^\d\n]*signing bonus\s*|\s*signing bonus[^\d\n]*$', '', s))
Prints,
our signing bonus is 123,000 --> is 123,000
This year signing bonus is bad. the signing bonus for this year is EUR 123,000 --> for this year is EUR 123,000
The bonus is 14,456, but signing bonus. --> The bonus is 14,456, but
Upvotes: 2