David Kachlon
David Kachlon

Reputation: 201

Regex Situation... More than one group with variable spaces

I'm new to regex but I seem to have things going my way.

https://regex101.com/r/Is8wZK/1 --- group 8 might have more than one word in it... sepereated by a space, but, as u can see, so does group 5, and i've exhausted my one time useage of (.+)

How can I re-write my regex to detect group 8 in exactly the way group 5 is detected?

Upvotes: 2

Views: 55

Answers (2)

Joey Pabalinas
Joey Pabalinas

Reputation: 126

^(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+((?:[[:alpha:]]+)(?:\s+[[:alpha:]]+)*)\s+(\S+)\s+(\S+)\s+((?:[[:alpha:]]+)(?:\s+[[:alpha:]]+)*)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)$

Link: https://regex101.com/r/v4mEJK/1

Pretty much all you need to do is match a group of alphabetic character and an optional group of spaces plus alphabetic characters in order to capture names which may or may not have more than one word; this is done by using

(?:[[:alpha:]]+)(?:\s+[[:alpha:]]+)*)

for groups 5 and 8.

The rest of the regex could possibly be made more specific, but there isn't really any need to add more complexity unless your input text is significantly more complex than your test case.

FWIW: It's far better to use \s+ instead of a raw space between groups so you can match other delimiting whitespace.

Upvotes: 2

Isaac
Isaac

Reputation: 11805

I redid your generic capture groups into this:

^(\d+\/\d+\/\d+) ([A-Z]\d+) (\d+) (\d+) (.+) (\d+[A-Z]{3}\d+) (\d+) (.+) ([A-Z]) (\d+\.\d+) (\d+\.\d+) (\d+\.\d+)$

Breaking that down:

  • (\d+\/\d+\/\d+): this matches the date
  • ([A-Z]\d+): this matches a capital followed by some numbers
  • (\d+): this matches a number
  • (\d+): this matches a number
  • (.+): this is the first general group
  • (\d+[A-Z]{3}\d+): this matches any number followed by 3 capitals followed by any number
  • (\d+): this matches a number
  • (.+): this is the second general group
  • (\d+\.\d+): this matches a number with a decimal point
  • (\d+\.\d+): this matches a number with a decimal point
  • (\d+\.\d+): this matches a number with a decimal point

This should help you get what you want.


If you are only interested in groups 5 and 8, try non capturing groups:

^(?:\d+\/\d+\/\d+) (?:[A-Z]\d+) (?:\d+) (?:\d+) (.+) (?:\d+[A-Z]{3}\d+) (?:\d+) (.+) (?:[A-Z]) (?:\d+\.\d+) (?:\d+\.\d+) (?:\d+\.\d+)$

Or only group what you need:

^\d+\/\d+\/\d+ [A-Z]\d+ \d+ \d+ (.+) \d+[A-Z]{3}\d+ \d+ (.+) [A-Z] \d+\.\d+ \d+\.\d+ \d+\.\d+$

Upvotes: 1

Related Questions