Reputation: 9499
Background:
I have a PowerShell script which I am using to process some XML files. These XML files have embedded 'tokens' and 'filters'. The tokens get resolved in my script and the filters are applied to resolved value of the preceding token evaluation.
Tokens are defined like:
{!#T#TokenName#T#!}
Filters are defined like:
{!#F#FilterName#F#!}
Some of the tokens and filters have 'parameters', all parameters are in their own parameter markers and ALL parameters MUST be explicitly named, three equality characters separate parameter name and parameter value:
{!#P#ParamName===ParamValue#P#!}
For example the following ‘RegVal’ token has two parameters ‘RegKey’ and ‘Name’:
{!#T#RegVal{!#P#RegKey===RegKeyPath#P#!}{!#P#Name===RegValName #P#!}#T#!}
The Problem
I have already got a working system that processes the tokens and filters with parameters (after I have extracted a string from enclosing XML tags). I first identify the individual tokens with a regex as below.
(?si){!#T#((?:(?!{!#T#.*#T#!}).)*)#T#!}
...The problem is I now want to embed tokens within other tokens, such as:
{!#T#ContainingToken{!#P#ParamName==={!#T#RegVal{!#P#RegKey===HKLM:\SOFTWARE\TestKey#P#!}{!#P#Name===TestEntry#P#!}#T#!}#P#!}#T#!}
The above regex is not suitable, I am not a regex expert and I had trouble enough doing the regex above so it's time to ask for help.
I think this will be possible with an adjusted regex? Following limits are perfectly acceptable:
-embedding only one deep.
-only embedding within the parameter value (so after the: === )
-a second pass of the parameter to reveal any contained tokens and filters.
For ref here is PowerShell fragment:
function Get-Matches($pattern)
{
begin {
Try {
$regex = New-Object Regex($pattern)
}
Catch {
Throw "Get-Matches: Pattern not correct. '$pattern' is not a valid regular expression."
}
}
process {
foreach ($match in ($regex.Matches($_)))
{
([Object[]]$match.Groups)[-1].Value
}
}
}
function Get-ParsedInput([String] $rawValue)
{
$intermediateValue = $rawValue
$tokenMatches = @($intermediateValue | Get-Matches '(?si){!#T#((?:(?!{!#T#.*#T#!}).)*)#T#!}') # Wrapped as array...
if ($tokenMatches.Count -gt 0)
{
$i=1
$tokens = @{ }
foreach ($tokenTextWithParms in $tokenMatches)
{
# ...from here I instantiate new token instance...
Upvotes: 0
Views: 470
Reputation: 9499
Based on second last example in this blog post…
http://blog.stevenlevithan.com/archives/balancing-groups
I ended up with this:
(?x) {!#T# ( (?> (?! {!#T# | #T#!} ) . | {!#T# (?) | #T#!} (?<-Depth>) )* (?(Depth)(?!)) ) #T#!}
…seems to work sufficiently, currently not 100% why though!
Upvotes: 0
Reputation: 29449
As for the nested patterns - generally regexes are not the tool for that because they origin from grammar that can not handle "counting". But in .NET (and therefore also in PowerShell) it might be possible. Have a look at http://blogs.msdn.com/b/bclteam/archive/2005/03/15/396452.aspx . There are probably also other sources, but this one is the first I ran over.
Upvotes: 1