Reputation: 107
I´m still crawling in Powershell so decided to ask after trying without being successful.
I have a HTML code like below. I need to extract the Chile word which is present on tr tag and all values present on td tags and export it to a .txt file.
Using the code below it works perfectly BUT it´s depending on the font color:
$result = [regex]::Matches($content, 'style="color:black;".*?>(.*?)</span>')
$result | select { ($_.Groups[1].Value -replace ' ', '' -replace '​', '').Trim().Trim(',')} | Out-file $outfile -Encoding ascii
As you can see on HTML code, some columns (TD) does not have the pattern
How can I get these values in Powershell? I´ve tried below options but no luck:
$result = [regex]::Matches($content, 'style="windowtext;".*?>(.*?)</td>')
$result | select { ($_.Groups[1].Value -replace ' ', '').Trim().Trim(',')} | Out-file $outfile
$result = [regex]::Matches($content, '<td.*?>(.+)</td>')
$result = [regex]::Matches($content, '<td.*?>(.*?)</td>') | % { $_.Captures[0].Groups[1].value} | Out-file $outfile
Again, I need to extract the Chile word which is present on TR tag and all values present on TD tags and export it to a .TXT file.
<tr class="ms-rteFontSize-1 ms-rteTableOddRow-1" dir="rtl" style="height:15pt;"><th class="ms-rteTableFirstCol-1" rowspan="1" colspan="1" style="border- width:medium 1pt 1pt;border-style:none solid solid;padding:0in 5.4pt;width:100px;height:15pt;border-right-color:windowtext;border-bottom- color:windowtext;border-left-color:windowtext;"><div><b><span style="color:black;">Chile</span></b></div></th>
<td width="64" class="ms-rteTableOddCol-1" valign="bottom" style="border-width:medium 1pt 1pt medium;border-style:none solid solid none;padding:0in 5.4pt;width:48pt;height:15pt;border-right-color:windowtext;border-bottom-color:windowtext;">2</td>
<td class="ms-rteTableEvenCol-1" valign="bottom" style="border-width:medium 1pt 1pt medium;border-style:none solid solid none;padding:0in 5.4pt;width:66px;height:15pt;border-right-color:windowtext;border-bottom-color:windowtext;"> </td>
<td class="ms-rteTableOddCol-1" valign="bottom" style="border-width:medium 1pt 1pt medium;border-style:none solid solid none;padding:0in 5.4pt;width:81px;height:15pt;border-right-color:windowtext;border-bottom-color:windowtext;"> </td>
<td width="64" class="ms-rteTableEvenCol-1" valign="bottom" style="border-width:medium 1pt 1pt medium;border-style:none solid solid none;padding:0in 5.4pt;width:48pt;height:15pt;border-right-color:windowtext;border-bottom-color:windowtext;">14,19</td>
<td width="64" class="ms-rteTableOddCol-1" valign="bottom" style="border-width:medium 1pt 1pt medium;border-style:none solid solid none;padding:0in 5.4pt;width:48pt;height:15pt;border-right-color:windowtext;border-bottom-color:windowtext;"><div><span style="color:black;">1</span></div></td>
<td width="64" class="ms-rteTableEvenCol-1" valign="bottom" style="border-width:medium 1pt 1pt medium;border-style:none solid solid none;padding:0in 5.4pt;width:48pt;height:15pt;border-right-color:windowtext;border-bottom-color:windowtext;"><div><span style="color:black;">26</span></div></td>
<td width="64" class="ms-rteTableOddCol-1" valign="bottom" style="border-width:medium 1pt 1pt medium;border-style:none solid solid none;padding:0in 5.4pt;width:48pt;height:15pt;border-right-color:windowtext;border-bottom-color:windowtext;"> </td>
<td width="64" class="ms-rteTableEvenCol-1" valign="bottom" style="border-width:medium 1pt 1pt medium;border-style:none solid solid none;padding:0in 5.4pt;width:48pt;height:15pt;border-right-color:windowtext;border-bottom-color:windowtext;"><div><span style="color:black;">15</span></div></td>
<td class="ms-rteTableOddCol-1" valign="bottom" style="border-width:medium 1pt 1pt medium;border-style:none solid solid none;padding:0in 5.4pt;width:80px;height:15pt;border-right-color:windowtext;border-bottom-color:windowtext;"><div><span style="color:black;">18,19</span></div></td>
<td width="64" class="ms-rteTableEvenCol-1" valign="bottom" style="border-width:medium 1pt 1pt medium;border-style:none solid solid none;padding:0in 5.4pt;width:48pt;height:15pt;border-right-color:windowtext;border-bottom-color:windowtext;"><div><span style="color:black;">9,27</span></div></td>
<td class="ms-rteTableOddCol-1" valign="bottom" style="border-width:medium 1pt 1pt medium;border-style:none solid solid none;padding:0in 5.4pt;width:80px;height:15pt;border-right-color:windowtext;border-bottom-color:windowtext;"><div><span style="color:black;">1</span></div></td>
<td class="ms-rteTableEvenCol-1" valign="bottom" style="border-width:medium 1pt 1pt medium;border-style:none solid solid none;padding:0in 5.4pt;width:80px;height:15pt;border-right-color:windowtext;border-bottom-color:windowtext;"><div><span style="color:black;">8,25</span></div></td></tr>
Upvotes: 0
Views: 2758
Reputation: 1152
I have to make some assumptions here to provide you with an answer. I'm assuming that your are working with an complete HTML document. If you are not then please update your requirements as it might be easier to just treat your document as XML.
Retrieve that document with invoke-webrequest:
$html = invoke-webrequest "http://www.yourpath.here"
Now I am going to assume you are working with content that has only 1 table on that page. This will get the first table on the returned document. Should you not want the first table you can either change the index or you can use a where clause to select the table you want based on criteria.
$table = $html.parsedHtml.getElementsByTagName("table")[0]
Now because I don't know the entire contents of your table I'm going to assume that "Chile" does not appear anywhere else inside that entire table. This needs to be true as I am going to take a simple approach to ignore all the innerHTML inside your TR. Should this not be the case you will need to implement additional logic to check that you are only reading the TH inside the TR.
$TR = $table.getElementsByTagName("tr") | where { $_.innerText -like "*Chile*" }
Next we can grab all of the TD elements:
$TD = $TR.getElementsByTagName("td")
At this point you have all of the TD objects in an array. You dump the contents with:
$TD | foreach { $_.innerText }
oddly, just doing $TD.innerText will not yield this output.
Upvotes: 1