Reputation: 44998
I need to extract parts of a string using regex in Python.
I'm good with basic regex but I'm terrible at lookarounds. I've shown two sample records below. The last big is always a currency field e.g. in the first one it is 4,76. In the second one it is 2,00. The second has an account number that is the pattern of \d{6}-\d{6}. Anything after that is the currency.
24.02 24.02VALINTATALO MEGAHERTSI4,76-
24.02 24.02DOE MRIDANG 157235-1234582,00-
Could you help me out with this regex? What I've written so far is given below but it considers everything after the 'dash' in the account number to be the currency.
.*?(\d\d\.\d\d)(.*?)\s*(?<!\d{6}-\d{6})(\d*,\d\d)
Thanks in advance
Upvotes: 0
Views: 253
Reputation: 3725
This seems to work:
.*?(\d\d\.\d\d)(.*?)(?:\d{6}-\d{6})?(\d*,\d\d)
Explanation: (?:\d{6}-\d{6}) sees the account number but doesn't remember it. The question mark after it allows the account number to be absent. The reason we don't want to remember the account number is that it throws off the index that we access with match.group(3). I.e., it could be at index 4 if the account number were present.
Upvotes: 1
Reputation: 988
(?<=\d{6}-\d{6}|[A-Z ])[0-9,]+(?=-$)
This regex matches the first string of digits and commas that is preceded by either an account number or a letter or a space and has a dash after it which is the last character of the line/string.
Upvotes: 0
Reputation: 336158
(?<=\b\d{6}-\d{6}|[^-\d])\d+?,\d\d
will match a "currency" that's either preceded by an account number or anything else (except for a hyphen). Is that sufficient?
Upvotes: 0
Reputation: 10536
import re
def extract_current(s):
s = s[s.rfind(' ')+1:-1]
s = re.sub('\d{6}-\d{6}', '', s)
s = re.sub('[A-Z]+', '', s)
return s
print extract_current('24.02 24.02VALINTATALO MEGAHERTSI4,76-')
print extract_current('24.02 24.02DOE MRIDANG 157235-1234582,00-')
Output:
4,76
2,00
Upvotes: 1