JustCurious
JustCurious

Reputation: 183

Regex match Improvement

I have this text:

<td class="devices-user-name">devicename</td>
            <td>192.168.133.221</td>
            <td>Storage Sync</td>
            <td>10.3.3.335</td>
            <td>Active</td>
            <td>7/26/2016 8:39PM</td>
            <td class="devices-details-button"><a class="btn btn-mini" href="#settings/devices/1/239a9cd0-d6c9-4e7d-9918-0cd686a57aac">Details</a></td>

I want to catch everything between the <td> </td> as well the <td class=...> </td>

What I achieved is this regex:

<td.*>(.*?)<\/td>(\n(.*<td>(.*?)<\/td>))(\n(.*<td>(.*?)<\/td>))(\n(.*<td>(.*?)<\/td>))(\n(.*<td>(.*?)<\/td>))(\n(.*<td>(.*?)<\/td>))(\n(.*<td.*href="(.*?)"))

After that I still need to exclude all the <td> matches:

$MatchResult = $Matches.GetEnumerator() | ? {$_.Value -notmatch 'td'} | Sort Name

Finally I get this results:

Name                           Value
----                           -----
1                              devicename
4                              192.168.133.221
7                              Storage Sync
10                             10.3.3.335
13                             Active
16                             7/26/2016 8:39PM
19                             #settings/devices/1/239a9cd0-d6c9-4e7d-9918-0cd686a57aac

But I'm quiet sure that there's a better way, instead of duplicating the groups, excluding stuff etc. to use some other/better technics, which I'll be happy to learn.

What is your suggestion?

Upvotes: 1

Views: 156

Answers (1)

Martin Brandl
Martin Brandl

Reputation: 58991

You can use [regex]::Matches to get multiple matches (instead of using \n):

$content = Get-Content 'your-File'
[regex]::Matches($content , '<td.*?>(.+?)<\/td>') | ForEach-Object {
    $_.Groups[1].Value
}

Regex:

<td.*?>(.+?)<\/td>

Regular expression visualization

Output:

devicename
192.168.133.221
Storage Sync
10.3.3.335
Active
7/26/2016 8:39PM
<a class="btn btn-mini" href="#settings/devices/1/239a9cd0-d6c9-4e7d-9918-0cd686a57aac">Details</a>

Note: You probably want to extract the href in another step or by adjusting the regex - but you question was about catching everything between <td>...

Upvotes: 2

Related Questions