Yuriy Samorodov
Yuriy Samorodov

Reputation: 75

Powershell + Regular Expressions - How to get multiple matches?

I have already broken my head trying to solve the problem below and I will appreciate every comment or piece of advise on this.

Prerequisites

  1. HTML text

    <div style="font-size:8pt; font-family: Calibri, sans-serif;">Some text here</div>

2) Powershell v.3

Task

To parse given text and select only tags

Approach

$text_to_parse = '<div style="font-size:8pt; font-family: Calibri, sans-serif;">Some    text here</div>'
if($text_to_parse -match '</?div[^<>]*>'){$Matches | fl}
Name  : 0
Value : <div style="font-size:8pt; font-family: Calibri, sans-serif;">

Issues

1) As you can see, it does not show second match despite /?quantifier 2) I do understand, that there must be "Global" anchor, but I cannot find it even in MSDN: http://msdn.microsoft.com/library/az24scfc.aspx 3) \G anchor doesn't work as well even if I add pattern for one or more character in the begining:

if($text_to_parse -match '\G<.*?/?div[^<>]*>'){$Matches | fl}

Name  : 0
Value : <div style="font-size:8pt; font-family: Calibri, sans-serif;">`

Questions

1) What I am doing wrong? I spent well more 4 hours trying to figure it out without any success. 2) Is there any "Global" anchor in RegEx realization in Powershell? 3) Finally, how to match both HTML tags with Regular Expressions only? I can do something like this:

($text_to_parse -replace '\G<.*?/?div[^<>]*>',"").TrimEnd("</div>")

And get this:

Some text here

But I'd like to do this with Regular Expressions.

Kind regards, Yuriy

Upvotes: 3

Views: 8800

Answers (2)

sonjz
sonjz

Reputation: 5090

The -match operator only returns the first match. In order to get multiple matches, use the following syntax:

$text_to_parse = '<div style="font-size:8pt; font-family: Calibri, sans-serif;">Some    text here</div>' ;
$matches = ([regex]'</?div[^<>]*>').Matches($text_to_parse) ;
$matches[1].Value ; # returns second your occurrence, "</div>"

This method will return the array of matches we all know and love, and you can process them in any way you wish.

Upvotes: 3

stej
stej

Reputation: 29469

If I understand it correctly, you would like to match the text inside the tags. Then use something like this:

$text_to_parse -replace '<div[^>]+>(.*?)</div>', '$1'

it returns just the text.

Some text here


Besides that getting multiple matches reminds me this task:

Given test "ab cd ef ax 0 a0" select all strings that begin with "a"

Then

$s = "ab cd ef ax 0 a0"
$s -match '\ba\w'

is useles, but you may go with this:

$s | Select-String '\ba\w' -AllMatches | 
   % { $_.Matches } |                        # select matches
   % { $_.Value }                            # selectt values from matches

In V3 it is maybe more simple, this is for V2.

Upvotes: 2

Related Questions