lit
lit

Reputation: 16256

How to capture multiple, unknown number of values using regex

I would like to capture multiple strings using a regex. I was hoping that $matches would contain both a digita, digitb, and digitc values. It appears to capture digita and stop. If possible, I would like for the capture to be order independent. How can I do that?

PS C:\src\t> $s2 = 'a=3 c=5 b=4'
PS C:\src\t> $s2 -match 'a=(?<digita>[0-9])|b=(?<digitb>[0-9])c=(?<digitc>[0-9])'
True
PS C:\src\t> $matches

Name                           Value
----                           -----
digita                         3
0                              a=3

Upvotes: 1

Views: 328

Answers (2)

mklement0
mklement0

Reputation: 439307

PowerShell's -match operator only ever finds (at most) one match, afterwards reflected in the automatic $Matches variable.[1]

As Wiktor Stribiżew hints at in a comment on the question, using the underlying .NET [regex] class with its static .Matches() method directly returns all matches.

However, the way your regex is written, each match would contain all 3 capture groups (digita, digitb, digitc), with only one of them containing a captured value, which makes it awkward to access the results.

The following (PSv4+ syntax) instead uses:

  • a generically named capture group for the digit (?(<digit>...))
  • a generically named capture group for the letter as well (?(<letter>...))
  • without alternation (|)

so that each match contains 2 capture groups that contain the captured letter and digit, respectively.

$s2 = 'a=3 c=5 b=4'
$allMatches = [regex]::Matches($s2, '(?<letter>[abc])=(?<digit>[0-9])')
$allMatches.ForEach({ 
  'letter: {0} - digit: {1}' -f  $_.Groups['letter'].value, $_.Groups['digit'].value 
})

In PSv3-, you can use the foreach statement to iterate over the matches (foreach ($match in $allMatches) { ... }).

The above yields:

letter: a - digit: 3
letter: c - digit: 5
letter: b - digit: 4

Note that what [regex]::Matches() returns is a [System.Text.RegularExpressions.MatchCollection] instance, which is a collection of [System.Text.RegularExpressions.Match] instances.


An alternative is to use Select-String with the -AllMatches switch, which, however is slower - which may or may not matter, depending on use case (PSv2+):

$s2 = 'a=3 c=5 b=4'
$s2 | Select-String -AllMatches '(?<letter>[abc])=(?<digit>[0-9])' |
  Select-Object -ExpandProperty Matches | ForEach-Object { 
    'letter: {0} - digit: {1}' -f  $_.Groups['letter'].value, $_.Groups['digit'].value
  }

Select-Object outputs [Microsoft.PowerShell.Commands.MatchInfo] instances, whose .Matches property contains an array of, again, [System.Text.RegularExpressions.Match] instances.


[1] $Matches contains a [hashtable] instance, whose 0 entry contains the overall match, and with capture groups starting at entry 1, if unnamed; named capture groups can be accessed by their name, such as $Matches.digita in the example from the question.

Upvotes: 1

PlageMan
PlageMan

Reputation: 792

You could reverse the problem and do something like this :

,"a","b","c" | % { "$_=(?<digit$_>[0-9])" } | % { $s2 -match $_ } | % { $matches }

Outputs

Name                           Value                                                                                                                                                                   
----                           -----                                                                                                                                                                   
digita                         3                                                                                                                                                                       
0                              a=3                                                                                                                                                                     
digitb                         4                                                                                                                                                                       
0                              b=4                                                                                                                                                                     
digitc                         5                                                                                                                                                                       
0                              c=5 

Upvotes: 2

Related Questions