Aditya Kuls
Aditya Kuls

Reputation: 105

Non capturing look behind in regex python

I want to extract net profit from the statement, with 'net profit' as the non capturing part. Not sure how to do it(may be a non capturing look behind?)

eg

'business venture of net profit 23.5 million dollars'

required o/p:

23.5 million

Applied the following regex:

(net|nt)\s*\.?\s*(profit|earnings)\s*\.?\s*\d+\.?\d*\.?\s*(?:lakh|crore|million)

But, it is giving

[('net', 'profit')]

as the output.

Upvotes: 0

Views: 461

Answers (3)

akash karothiya
akash karothiya

Reputation: 5950

You can use (?:) for non-capture

s = 'business venture of net profit 23.5 million dollars'
re.findall(r'(?:net|nt)\s*\.?\s*(?:profit|earnings)\s*\.?\s*(\d+\.?\d*)\.?\s*(lakh|crore|million)',s)
[('23.5', 'million')]

Upvotes: 1

Ludisposed
Ludisposed

Reputation: 1769

You didn't capture the digitgroup. Also you need a non-capturing group with the 'net' and 'profit'

so this should work:

Edit to capture million..etc

import re
s = 'business venture of net profit 23.5 million dollars'
re.findall(r'(?:net|nt)\s*\.?\s*(?:profit|earnings)\s*\.?\s*(\d+\.?\d*)\.?\s*(lakh|crore|million)', s)
# output: ['23.5', 'million']

Example at: https://regex101.com/r/EXCzeV/2

Upvotes: 1

Abhishek Gurjar
Abhishek Gurjar

Reputation: 7476

Try with below regex you will get the result in group 1,

(?:ne?t\s(?:profit|earning)\s)([\d\.]+\s(?:million|laks|crore))

DEMO

Upvotes: 2

Related Questions