Reputation: 133
I need to implement a function performing the transformation:
Characters '{', '}', '|', '\' have a special meaning. Characters enclosed in { and } should betreated as a set of |-delimited options, one of which should be chosen atrandom and used to replace the enclosure in the resulting string.Enclosures may be nested. '\' should be interpreted as a literal escapingcharacter, so that braces and pipes preceded by it will not be parsed aspart of the source string syntax.
For example:
Example 1:
'ab{c|d|e}'
Possible results:
'abc'
'abd'
'abe'
Example 2:
'a{b{c|d}|e{f|g}}h'
Possible results:
'abch'
'abdh'
'aefh'
'aegh'
Example 3:
'a{b|c}d{e|f}h'
Possible results:
'abdeh'
'abdfh'
'acdeh'
'acdfh'
Example 4:
'{\{|\||\}}'
Possible results
'{'
'|'
'}'
hope I've explained this clearly. If you get what I'm trying to achieve and can think of a better way to do it I'd be very grateful.
Upvotes: 0
Views: 56
Reputation:
If no nesting, and is kept simple, by using the \G
construct and some other
logic based on capture groups, you should be able to construct the
literal outputs.
Note - that this will just parse the isolated grouping, not after.
This would be considered a sub-part of what you want to parse.
You can remove the ^
anchor and add logic for mid-string usage.
But, like said, this is just one piece.
The usage for below -
Use a while loop, searching for each match.
In the body of the while, these are the triggers:
- If group 1 matched, store the prefix value.
(group 2 will match at the same time).
- If group 2 matched, this is the next OR, combine it with the prefix
for a unique part.
- Group 3 will always match. It is the indicator of where you are.
It matches either a OR bar |
or an end brace }
.
Keep caching the temporary unique parts until the end brace is found.
At the end of all the matches, if the end was encountered, the unique parts
are valid. I.e., the whole thing parsed ok.
In each part, you will need to remove any escapes, as they are not stripped
in the match.
Find: (?<!\\)\\\K\\
'~(?<!\\\)\\\\\K\\\\~'
Replace: nothing
Good Luck!
The regex:
(?:(?!^)\G|^([^\|{}\\]*(?:\\.[^\|{}\\]*)*){)([^\|}\\]*(?:\\.[^\|}\\]*)*)(\||})
'~(?:(?!^)\G|^([^\|{}\\\]*(?:\\\.[^\|{}\\\]*)*){)([^\|}\\\]*(?:\\\.[^\|}\\\]*)*)(\||})~'
Formatted:
(?:
(?! ^ )
\G
|
^
( # (1 start)
[^\|{}\\]*
(?: \\ . [^\|{}\\]* )*
) # (1 end)
{
)
( # (2 start)
[^\|}\\]*
(?: \\ . [^\|}\\]* )*
) # (2 end)
( \| | } ) # (3)
Example 4 Output:
** Grp 0 - ( pos 0 , len 4 )
{\{|
** Grp 1 - ( pos 0 , len 0 ) EMPTY
** Grp 2 - ( pos 1 , len 2 )
\{
** Grp 3 - ( pos 3 , len 1 )
|
------------------
** Grp 0 - ( pos 4 , len 3 )
\||
** Grp 1 - NULL
** Grp 2 - ( pos 4 , len 2 )
\|
** Grp 3 - ( pos 6 , len 1 )
|
-------------------
** Grp 0 - ( pos 7 , len 3 )
\}}
** Grp 1 - NULL
** Grp 2 - ( pos 7 , len 2 )
\}
** Grp 3 - ( pos 9 , len 1 )
}
Example 1 Output:
** Grp 0 - ( pos 0 , len 5 )
ab{c|
** Grp 1 - ( pos 0 , len 2 )
ab
** Grp 2 - ( pos 3 , len 1 )
c
** Grp 3 - ( pos 4 , len 1 )
|
-------------------
** Grp 0 - ( pos 5 , len 2 )
d|
** Grp 1 - NULL
** Grp 2 - ( pos 5 , len 1 )
d
** Grp 3 - ( pos 6 , len 1 )
|
-------------------
** Grp 0 - ( pos 7 , len 2 )
e}
** Grp 1 - NULL
** Grp 2 - ( pos 7 , len 1 )
e
** Grp 3 - ( pos 8 , len 1 )
}
Upvotes: 1