Reputation: 437
My RegEx can also be found here, although I manually inserted characters to look for carriage returns.
((?:\d{6}?)([A-Z\d]{3})?(?:[\^r\ ]+)(([A-Z\d]{6}|[A-Z\d]{5} |[A-Z\d]{4} ))?)
I've specified a blank space after a 5 and 4 character string, but yet my regular expression seems to be ignoring it at least on the first line. It matches "EXTEND" even though I don't want it to. Only if there's a space after. It works on the third line though with "XOBUS ".
FPCN54 CWNT 080810^r^r EXTENDED FORE #should not match anything
ASUS42 KMHX 080425^r^r RWRMHX^r^r WEAT #should match RWRMHX
RXUS30 KWNO 081300^r^r XOBUS ^r^r GREA #should match XOBUS w/ 1 trailing space
FXUS64 KEWX 081112 RR3^r^r AFDEWX^r^r #should match RR3 and AFDEWX
Edit: Forgot to include a 3 character alphanumeric before the first carriage return. See line 4. Need to capture that as well.
Upvotes: 1
Views: 78
Reputation: 482
Based on your desired output above, you're overcomplicating the regex. **BTW, above you said, '5 or 4 character string,' but in your desired output, you have a "6" character string and a 3 character string.
>>> import re
>>> string = '''FPCN54 CWNT 080810^r^r EXTENDED FORE #should not match
anything
ASUS42 KMHX 080425^r^r RWRMHX^r^r WEAT #should match RWRMHX
RXUS30 KWNO 081300^r^r XOBUS ^r^r GREA #should match XOBUS w/ 1 trailing
space
FXUS64 KEWX 081112 RR3^r^r RR3555^r^r AFDEWX^r^r #should match RR3, RR3555, and AFDEWX'''
>>> re.findall('(?m)([\d]*[A-Z]+(?:[A-Z]*[\d]*)*[\s]*)[\^r]{2,}', string)
#OUTPUT
['RWRMHX', 'XOBUS ', 'RR3', 'AFDEWX', 'RR3555']
Upvotes: 1
Reputation: 27743
This RegEx might help you to perform an exact match and divide your string inputs to several groups, if/as you wish, and reconstruct your target outputs:
([A-Z0-9]{6})\s([A-Z]{4})\s([0-9]{6})([\^|r])+\s([A-Z]+)([\^|r\s]+)(.+)
You might remove any group ()
that you wish and it would still match.
You can reduce boundaries, if you wish.
Upvotes: 0