Mridang Agarwalla
Mridang Agarwalla

Reputation: 44998

Regex pattern problem in python

I need to extract parts of a string using regex in Python.

I'm good with basic regex but I'm terrible at lookarounds. I've shown two sample records below. The last big is always a currency field e.g. in the first one it is 4,76. In the second one it is 2,00. The second has an account number that is the pattern of \d{6}-\d{6}. Anything after that is the currency.

24.02 24.02VALINTATALO MEGAHERTSI4,76-
24.02 24.02DOE MRIDANG 157235-1234582,00-

Could you help me out with this regex? What I've written so far is given below but it considers everything after the 'dash' in the account number to be the currency.

.*?(\d\d\.\d\d)(.*?)\s*(?<!\d{6}-\d{6})(\d*,\d\d)

Thanks in advance

Upvotes: 0

Views: 253

Answers (4)

Dan
Dan

Reputation: 3725

This seems to work:

.*?(\d\d\.\d\d)(.*?)(?:\d{6}-\d{6})?(\d*,\d\d)

Explanation: (?:\d{6}-\d{6}) sees the account number but doesn't remember it. The question mark after it allows the account number to be absent. The reason we don't want to remember the account number is that it throws off the index that we access with match.group(3). I.e., it could be at index 4 if the account number were present.

Upvotes: 1

tiftik
tiftik

Reputation: 988

(?<=\d{6}-\d{6}|[A-Z ])[0-9,]+(?=-$)

This regex matches the first string of digits and commas that is preceded by either an account number or a letter or a space and has a dash after it which is the last character of the line/string.

Upvotes: 0

Tim Pietzcker
Tim Pietzcker

Reputation: 336158

(?<=\b\d{6}-\d{6}|[^-\d])\d+?,\d\d

will match a "currency" that's either preceded by an account number or anything else (except for a hyphen). Is that sufficient?

Upvotes: 0

compie
compie

Reputation: 10536

import re

def extract_current(s):
    s = s[s.rfind(' ')+1:-1]
    s = re.sub('\d{6}-\d{6}', '', s)
    s = re.sub('[A-Z]+', '', s)
    return s

print extract_current('24.02 24.02VALINTATALO MEGAHERTSI4,76-')
print extract_current('24.02 24.02DOE MRIDANG 157235-1234582,00-')

Output:

4,76
2,00

Upvotes: 1

Related Questions