Reputation: 19507
I've had a search but I can't find a working solution to this. I'm working on a regex for the header formatting for Excel files. These use &-commands for formatting headers and footers, then left, centre and right headers are simply joined together:
(¶18.3.1.39 in the ECMA specification)
&L&"Lucida Grande,Standard"&K000000Left top&C&"Lucida Grande,Standard"&K000000Middle top&R&"Lucida Grande,Standard"&K000000Right top
All three parts are optional. Based on what I've read about making groups optional I've come up with the following regex (Python style):
re.compile(r"""
(?P<left>&L.+?)
(?P<center>&C.+?)
(?P<right>&R.+?)$
""", re.VERBOSE)
But it fails with a simple string containing just one part &Ltest header
. I think I understand the underlying problem – the patterns for missing optional groups affects the other patterns – but not the syntax, or more exactly what happens when a an optional group is missing.
Upvotes: 0
Views: 2558
Reputation:
You could use a regex that matches left/center/right
with a series of alternations.
A conditional is used to match the parts irrespective of the order they appear in the line.
This will then make it possible to match 1,2, or 3 of them.
updated
Modified to match each section up until the next section (if its there).
Based on the info about conditionals from here -> http://www.rexegg.com/regex-conditionals.html
If its python/PCRE this should work:
(?:(?:[^&]|&[\S\s])*?(?:&L(?P<left>(?(left)(?!))(?:[^&]|&[^LCR])*)|&C(?P<center>(?(center)(?!))(?:[^&]|&[^LCR])*)|&R(?P<right>(?(right)(?!))(?:[^&]|&[^LCR])*))){1,3}
If its Perl/PCRE, this works:
# (?:(?:[^&]|&[\S\s])*?(?:&L(?<left>(?(<left>)(?!))(?:[^&]|&[^LCR])*)|&C(?<center>(?(<center>)(?!))(?:[^&]|&[^LCR])*)|&R(?<right>(?(<right>)(?!))(?:[^&]|&[^LCR])*))){1,3}
(?:
(?: [^&] | & [\S\s] )*? # Get all possible quoted &&
# even &[LCR] if needed
(?: # Get one of &L or &C or &R
&L
(?<left> # (1), Left
(?(<left>)
(?!) # Allow only 1 left
)
(?: [^&] | & [^LCR] )* # Get all possible quoted && up to but not &[LCR]
)
|
&C
(?<center> # (2), Center
(?(<center>)
(?!) # Allow only 1 center
)
(?: [^&] | & [^LCR] )*
)
|
&R
(?<right> # (3), Right
(?(<right>)
(?!) # Allow only 1 right
)
(?: [^&] | & [^LCR] )*
)
)
){1,3} # Do 1 to 3 times
Output:
** Grp 0 - ( pos 0 , len 132 )
&L&"Lucida Grande,Standard"&K000000Left top&C&"Lucida Grande,Standard"&K000000Middle top&R&"Lucida Grande,Standard"&K000000Right top
** Grp 1 - ( pos 2 , len 41 )
&"Lucida Grande,Standard"&K000000Left top
** Grp 2 - ( pos 45 , len 43 )
&"Lucida Grande,Standard"&K000000Middle top
** Grp 3 - ( pos 90 , len 42 )
&"Lucida Grande,Standard"&K000000Right top
Upvotes: 1
Reputation: 43166
Try
^(?:.*?(?P<left>&L.[^&]*))?(?:.*?(?P<center>&C.[^&]*))?(?:.*?(?P<right>&R.[^&]*))?.*$
Explanation of the left
group (center
and right
are pretty much the same):
(?:
.*? # consume any preceding text
(?P<left> # then capture...
&L # "&L" literally
. # the character after that
[^&]* # and then everything up to the next "&" character
)
)? # and make the whole thing optional.
P.S.: Your pattern didn't make any of the groups optional. You should've put the ?
after the group, like (?P<left>&L.+)?
.
Since the groups aren't supposed to end at the next &
character, you can try the pattern
(?P<left>&L.+?)?(?P<center>&C.+?)?(?P<right>&R.+?)?$
instead. All I did was to make all groups optional by adding a ?
, and forcing the pattern to consume the entire string by putting the anchor $
at the end.
Update: (?:&L(?P<left>.+?))?(?:&C(?P<center>.+?))?(?:&R(?P<right>.+?))?$
won't capture the &L
, &C
and &R
bits.
Upvotes: 3