Reputation: 75
I have already broken my head trying to solve the problem below and I will appreciate every comment or piece of advise on this.
HTML text
<div style="font-size:8pt; font-family: Calibri, sans-serif;">Some text here</div>
2) Powershell v.3
To parse given text and select only tags
$text_to_parse = '<div style="font-size:8pt; font-family: Calibri, sans-serif;">Some text here</div>'
if($text_to_parse -match '</?div[^<>]*>'){$Matches | fl}
Name : 0
Value : <div style="font-size:8pt; font-family: Calibri, sans-serif;">
1) As you can see, it does not show second match despite /?
quantifier
2) I do understand, that there must be "Global" anchor, but I cannot find it even in MSDN: http://msdn.microsoft.com/library/az24scfc.aspx
3) \G
anchor doesn't work as well even if I add pattern for one or more character in the begining:
if($text_to_parse -match '\G<.*?/?div[^<>]*>'){$Matches | fl}
Name : 0
Value : <div style="font-size:8pt; font-family: Calibri, sans-serif;">`
1) What I am doing wrong? I spent well more 4 hours trying to figure it out without any success. 2) Is there any "Global" anchor in RegEx realization in Powershell? 3) Finally, how to match both HTML tags with Regular Expressions only? I can do something like this:
($text_to_parse -replace '\G<.*?/?div[^<>]*>',"").TrimEnd("</div>")
And get this:
Some text here
But I'd like to do this with Regular Expressions.
Kind regards, Yuriy
Upvotes: 3
Views: 8800
Reputation: 5090
The -match
operator only returns the first match. In order to get multiple matches, use the following syntax:
$text_to_parse = '<div style="font-size:8pt; font-family: Calibri, sans-serif;">Some text here</div>' ;
$matches = ([regex]'</?div[^<>]*>').Matches($text_to_parse) ;
$matches[1].Value ; # returns second your occurrence, "</div>"
This method will return the array of matches we all know and love, and you can process them in any way you wish.
Upvotes: 3
Reputation: 29469
If I understand it correctly, you would like to match the text inside the tags. Then use something like this:
$text_to_parse -replace '<div[^>]+>(.*?)</div>', '$1'
it returns just the text.
Some text here
Besides that getting multiple matches reminds me this task:
Given test "ab cd ef ax 0 a0" select all strings that begin with "a"
Then
$s = "ab cd ef ax 0 a0"
$s -match '\ba\w'
is useles, but you may go with this:
$s | Select-String '\ba\w' -AllMatches |
% { $_.Matches } | # select matches
% { $_.Value } # selectt values from matches
In V3 it is maybe more simple, this is for V2.
Upvotes: 2