Reputation: 201
I'm new to regex but I seem to have things going my way.
https://regex101.com/r/Is8wZK/1 --- group 8 might have more than one word in it... sepereated by a space, but, as u can see, so does group 5, and i've exhausted my one time useage of (.+)
How can I re-write my regex to detect group 8 in exactly the way group 5 is detected?
Upvotes: 2
Views: 55
Reputation: 126
^(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+((?:[[:alpha:]]+)(?:\s+[[:alpha:]]+)*)\s+(\S+)\s+(\S+)\s+((?:[[:alpha:]]+)(?:\s+[[:alpha:]]+)*)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)$
Link: https://regex101.com/r/v4mEJK/1
Pretty much all you need to do is match a group of alphabetic character and an optional group of spaces plus alphabetic characters in order to capture names which may or may not have more than one word; this is done by using
(?:[[:alpha:]]+)(?:\s+[[:alpha:]]+)*)
for groups 5 and 8.
The rest of the regex could possibly be made more specific, but there isn't really any need to add more complexity unless your input text is significantly more complex than your test case.
FWIW:
It's far better to use \s+
instead of a raw space between groups so you can match other delimiting whitespace.
Upvotes: 2
Reputation: 11805
I redid your generic capture groups into this:
^(\d+\/\d+\/\d+) ([A-Z]\d+) (\d+) (\d+) (.+) (\d+[A-Z]{3}\d+) (\d+) (.+) ([A-Z]) (\d+\.\d+) (\d+\.\d+) (\d+\.\d+)$
Breaking that down:
(\d+\/\d+\/\d+)
: this matches the date([A-Z]\d+)
: this matches a capital followed by some numbers(\d+)
: this matches a number(\d+)
: this matches a number(.+)
: this is the first general group(\d+[A-Z]{3}\d+)
: this matches any number followed by 3 capitals followed by any number(\d+)
: this matches a number(.+)
: this is the second general group(\d+\.\d+)
: this matches a number with a decimal point(\d+\.\d+)
: this matches a number with a decimal point(\d+\.\d+)
: this matches a number with a decimal pointThis should help you get what you want.
If you are only interested in groups 5 and 8, try non capturing groups:
^(?:\d+\/\d+\/\d+) (?:[A-Z]\d+) (?:\d+) (?:\d+) (.+) (?:\d+[A-Z]{3}\d+) (?:\d+) (.+) (?:[A-Z]) (?:\d+\.\d+) (?:\d+\.\d+) (?:\d+\.\d+)$
Or only group what you need:
^\d+\/\d+\/\d+ [A-Z]\d+ \d+ \d+ (.+) \d+[A-Z]{3}\d+ \d+ (.+) [A-Z] \d+\.\d+ \d+\.\d+ \d+\.\d+$
Upvotes: 1