Reputation: 1214

Powershell modifying HTML from ConvertTo-HTML

I have a script that generates an array of objects that I want to email out in HTML format. That part works fine. I am trying to modify the HTML string to make certain rows a different font color.

Part of the html string looks like this (2 rows only):

<tr>
    <td>ABL - Branch5206 Daily OD Report</td>
    <td>'\\CTB052\Shared_Files\FIS-BIC Reporting\Report Output Files\ABL\Operations\Daily\ABL - Branch5206 Daily OD Report.pdf'</td>
    <td>13124</td>
    <td>4/23/2013 8:05:34 AM</td>
    <td>29134</td>
    <td>0</td>
    <td>Delivered</td>
</tr>

<tr>
    <td>ABL - Branch5206 Daily OD Report</td>
    <td>'\\CTB052\Shared_Files\FIS-BIC Reporting\Report Output Files\ABL\Operations\Daily\ABL - Branch5206 Daily OD Report.xls'</td>
    <td>15716</td>
    <td>4/23/2013 8:05:34 AM</td>
    <td>29134</td>
    <td>0</td>
    <td>Delivered</td>
</tr>

I tried regex to add a font color to the beginning and end of the rows where the row ends with "Delivered": $email = [regex]::Replace($email, "<tr><td>(.*?)Delivered</td></tr>", '<tr><font color = green><td>$1Delivered</td></font></tr>')

This didn't work (I am not sure if you can set font color for a whole row like that).

Any ideas on how to do this easily/efficiently? I have to do it on several different statuses (like Delivered)

Upvotes: 1

Answers (1)

jpmc26

Reputation: 29934

Disclaimer: HTML cannot be parsed by regular expression parser. A regular expression will NOT provide a general solution to this problem. If your HTML structure is well known and you don't have any other <tr></tr> elements, though, the following might work. On that note, though, is there some reason you can't modify the HTML generation to do this then instead of waiting until the HTML is already generated?

Try this command:

PS > $email = $email -replace '(?s)<tr>(.*?)<td>Delivered</td>(.*?)</tr>','<tr style="color: #FF0000">$1<td>Delivered</td>$2</tr>'

The first string is the pattern. The (?s) tells the parser to allow . to accept newlines; this is called "single line" mode. Then it grabs a <tr> element that contains the string <td>Delivered</td>. The two capture groups grab everything else in the <tr> element around the <td>Delivered</td> string. Take note of the question marks following the *s. * by itself is greedy and matches as much text as possible; *? matches as little text as possible. If we just used * here, it would treat your entire string as one match and only replace the first <tr>.

The second string is the replacement. It plops the <tr> element and its contents back in place with an added style attribute, and all without back ref.

One other minor note is the quoting. I tend toward single quotes anyway, but in this case, you're likely to have double quotes in the replacement string. So single quotes are probably the way to go.

As for how you could do this for different statuses, regular expressions really aren't designed for conditional content like that; it's like trying to use a screwdriver as a drill. You can hard code several replaces or loop over status/color pairs and build your pattern and replace strings from them. A full blown HTML parser would be more efficient if you can find one for .NET; you might try to get away with an XML parser if you can guarantee it's valid XML. Or, going back to my question at the beginning, you could modify the HTML generation. If your e-mails are few in number, though, this may not be a bottleneck worth addressing. Development time spent is also costly. See if it's fast enough and try a different route if not.

Credit where it's due: I took the HTML style attribute from @FrankieTheKneeMan.

Upvotes: 1

Powershell modifying HTML from ConvertTo-HTML

Answers (1)

Related Questions