Reputation: 105
I want to extract net profit from the statement, with 'net profit' as the non capturing part. Not sure how to do it(may be a non capturing look behind?)
eg
'business venture of net profit 23.5 million dollars'
required o/p:
23.5 million
Applied the following regex:
(net|nt)\s*\.?\s*(profit|earnings)\s*\.?\s*\d+\.?\d*\.?\s*(?:lakh|crore|million)
But, it is giving
[('net', 'profit')]
as the output.
Upvotes: 0
Views: 461
Reputation: 5950
You can use (?:)
for non-capture
s = 'business venture of net profit 23.5 million dollars'
re.findall(r'(?:net|nt)\s*\.?\s*(?:profit|earnings)\s*\.?\s*(\d+\.?\d*)\.?\s*(lakh|crore|million)',s)
[('23.5', 'million')]
Upvotes: 1
Reputation: 1769
You didn't capture the digitgroup. Also you need a non-capturing group with the 'net' and 'profit'
so this should work:
Edit to capture million..etc
import re
s = 'business venture of net profit 23.5 million dollars'
re.findall(r'(?:net|nt)\s*\.?\s*(?:profit|earnings)\s*\.?\s*(\d+\.?\d*)\.?\s*(lakh|crore|million)', s)
# output: ['23.5', 'million']
Example at: https://regex101.com/r/EXCzeV/2
Upvotes: 1
Reputation: 7476
Try with below regex you will get the result in group 1,
(?:ne?t\s(?:profit|earning)\s)([\d\.]+\s(?:million|laks|crore))
Upvotes: 2