Charlie Clark
Charlie Clark

Reputation: 19507

Regex with multiple optional groups

I've had a search but I can't find a working solution to this. I'm working on a regex for the header formatting for Excel files. These use &-commands for formatting headers and footers, then left, centre and right headers are simply joined together:

(¶18.3.1.39 in the ECMA specification)

&L&"Lucida Grande,Standard"&K000000Left top&C&"Lucida Grande,Standard"&K000000Middle top&R&"Lucida Grande,Standard"&K000000Right top

All three parts are optional. Based on what I've read about making groups optional I've come up with the following regex (Python style):

re.compile(r"""
(?P<left>&L.+?)
(?P<center>&C.+?)
(?P<right>&R.+?)$
""", re.VERBOSE)

But it fails with a simple string containing just one part &Ltest header. I think I understand the underlying problem – the patterns for missing optional groups affects the other patterns – but not the syntax, or more exactly what happens when a an optional group is missing.

Upvotes: 0

Views: 2558

Answers (2)

user557597
user557597

Reputation:

You could use a regex that matches left/center/right with a series of alternations.
A conditional is used to match the parts irrespective of the order they appear in the line.
This will then make it possible to match 1,2, or 3 of them.

updated

Modified to match each section up until the next section (if its there).
Based on the info about conditionals from here -> http://www.rexegg.com/regex-conditionals.html

If its python/PCRE this should work:

(?:(?:[^&]|&[\S\s])*?(?:&L(?P<left>(?(left)(?!))(?:[^&]|&[^LCR])*)|&C(?P<center>(?(center)(?!))(?:[^&]|&[^LCR])*)|&R(?P<right>(?(right)(?!))(?:[^&]|&[^LCR])*))){1,3}  

If its Perl/PCRE, this works:

  # (?:(?:[^&]|&[\S\s])*?(?:&L(?<left>(?(<left>)(?!))(?:[^&]|&[^LCR])*)|&C(?<center>(?(<center>)(?!))(?:[^&]|&[^LCR])*)|&R(?<right>(?(<right>)(?!))(?:[^&]|&[^LCR])*))){1,3}

 (?:
      (?: [^&] | & [\S\s] )*?       # Get all possible quoted &&
                                    # even &[LCR] if needed
      (?:                           # Get one of   &L or &C or &R
           &L
           (?<left>                      # (1), Left
                (?(<left>)
                     (?!)                          # Allow only 1 left
                )
                (?: [^&] | & [^LCR] )*        # Get all possible quoted && up to but not &[LCR]
           )
        |  
           &C
           (?<center>                    # (2), Center
                (?(<center>)
                     (?!)                          # Allow only 1 center
                )
                (?: [^&] | & [^LCR] )*
           )
        |  
           &R
           (?<right>                     # (3), Right
                (?(<right>)
                     (?!)                          # Allow only 1 right
                )
                (?: [^&] | & [^LCR] )*
           )
      )
 ){1,3}                        # Do 1 to 3 times

Output:

 **  Grp 0 -  ( pos 0 , len 132 ) 
&L&"Lucida Grande,Standard"&K000000Left top&C&"Lucida Grande,Standard"&K000000Middle top&R&"Lucida Grande,Standard"&K000000Right top  
 **  Grp 1 -  ( pos 2 , len 41 ) 
&"Lucida Grande,Standard"&K000000Left top  
 **  Grp 2 -  ( pos 45 , len 43 ) 
&"Lucida Grande,Standard"&K000000Middle top  
 **  Grp 3 -  ( pos 90 , len 42 ) 
&"Lucida Grande,Standard"&K000000Right top  

Upvotes: 1

Aran-Fey
Aran-Fey

Reputation: 43166

Try

^(?:.*?(?P<left>&L.[^&]*))?(?:.*?(?P<center>&C.[^&]*))?(?:.*?(?P<right>&R.[^&]*))?.*$

regex101 demo.


Explanation of the left group (center and right are pretty much the same):

(?:
    .*? # consume any preceding text
    (?P<left> # then capture...
        &L # "&L" literally
        . # the character after that
        [^&]* # and then everything up to the next "&" character
    )
)? # and make the whole thing optional.

P.S.: Your pattern didn't make any of the groups optional. You should've put the ? after the group, like (?P<left>&L.+)? .


UPDATE

Since the groups aren't supposed to end at the next & character, you can try the pattern

(?P<left>&L.+?)?(?P<center>&C.+?)?(?P<right>&R.+?)?$

instead. All I did was to make all groups optional by adding a ?, and forcing the pattern to consume the entire string by putting the anchor $ at the end.

regex101 demo.

Update: (?:&L(?P<left>.+?))?(?:&C(?P<center>.+?))?(?:&R(?P<right>.+?))?$ won't capture the &L, &C and &R bits.

Upvotes: 3

Related Questions