Mannol
Mannol

Reputation: 39

Regex capture multi matches in Group

I'm not sure if this is possible. I'am searching for a way to capture multiple matches in a group.

This it work perfectly fine:

"Catch me if you can" -match "(?=.*(Catch))"
Result: Catch

I would like to have the result of two matches in the group:

"Catch me if you can" -match "(?=.*Catch)(?=.*me)"
Expected Result: Catch me

Upvotes: 0

Views: 1690

Answers (2)

mklement0
mklement0

Reputation: 439277

Note: If hard-coding the result of both regex subexpressions matching is sufficient, simply use:
if ('Catch me if you can' -match '(?=.*Catch)(?=.*me)') { 'Catch me' }

You're trying to:

  • match two separate regex subexpressions,
  • and report what specific strings they matched only if BOTH matched.

Note:

  • While it is possible to use a variation of your regex, which concatenates two look-ahead assertions ((?=.*(Catch))(?=.*(me))), to extract what the two subexpressions of interest captured, the captured substrings would be reported in the order in which the subexpressions are specified in the regex, not in the order in which the substrings appear in the input string. E.g., input string 'me Catch if you can' would also result in output string 'Catch me'

  • The following solution uses the [regex]::Match() .NET API for preserving the input order of the captured substrings by sorting the captures by their starting position in the input string:

$match = [regex]::Match('me Catch if you can', '(?=.*(Catch))(?=.*(me))', 'IgnoreCase')
if ($match.Success) { ($match.Groups | Select-Object -Skip 1 | Sort-Object Position).Value -join ' ' }

Note the use of the IgnoreCase option, so as to match PowerShell's default behavior of case-insensitive matching.

The above outputs 'me Catch', i.e. the captured substrings in the order in which they appear in the input string.

If instead you prefer that the captured substrings be reported in the order in which the subexpressions that matched them appear in the regex ('Catch me'), simply omit | Sort-Object Position from the command above.

Alternatively, you then could make your -match operation work, as follows, by enclosing the subexpressions of interest in (...) to form capture groups and then accessing the captured substrings via the automatic $Matches variable - but note that no information about matching positions is then available:

if ('me Catch if you can' -match '(?=.*(Catch))(?=.*(me))') {
  $Matches[1..2] -join ' '  # -> 'Catch me'
}

Note that this only works because a single match result captures both substrings of interest, due to the concatenation of two look-ahead assertions ((?=...)); because -match only ever looks for one match, the simpler 'Catch|me' regex would not work, as it would stop matching once either subexpression is found.

See also:

  • GitHub issue #7867, which suggests introducing a -matchall operator that returns all matches found in the input string.

Upvotes: 1

Darin
Darin

Reputation: 2368

The (?= is a LookAhead, but you don't have it looking ahead of anything. In this example LookAhead is looking ahead of "Catch" to see if it can find ".*me".

Catch(?=.*me)

Also, do you really want to match "catchABCme"? I would think you would want to match "catch ABC me", but not "catchABCme", "catchABC me", or "catch ABCme".

Here is some test code to play with:

$Lines = @(
    'catch ABC me if you can',
    'catch ABCme if you can',
    'catchABC me if you can'
)
$RegExCheckers = @(
    'Catch(?=.*me)',
    'Catch(?=.*\s+me)',
    'Catch\s(?=(.*\s+)?me)'
)

foreach ($RegEx in $RegExCheckers) {
    $RegExOut = "`"$RegEx`"".PadLeft(22,' ')
    foreach ($Line in $Lines) {
        $LineOut = "`"$Line`"".PadLeft(26,' ')
        if($Line -match $RegEx) {
            Write-Host "$RegExOut        matches $LineOut"
        } else {
            Write-Host "$RegExOut didn't match   $LineOut"
        }
    }
    Write-Host
}

And here is the output:

        "Catch(?=.*me)"        matches  "catch ABC me if you can"
        "Catch(?=.*me)"        matches   "catch ABCme if you can"
        "Catch(?=.*me)"        matches   "catchABC me if you can"

     "Catch(?=.*\s+me)"        matches  "catch ABC me if you can"
     "Catch(?=.*\s+me)" didn't match     "catch ABCme if you can"
     "Catch(?=.*\s+me)"        matches   "catchABC me if you can"

"Catch\s(?=(.*\s+)?me)"        matches  "catch ABC me if you can"
"Catch\s(?=(.*\s+)?me)" didn't match     "catch ABCme if you can"
"Catch\s(?=(.*\s+)?me)" didn't match     "catchABC me if you can"

As you can see, the last RegEx expression requires a space after "catch" and before "me".

Also, a great place to test RegEx is regex101.com, you can place the RegEx at the top and multiple lines you want to test it against in the box in the middle.

Upvotes: 0

Related Questions