Reputation: 951
I'd like to create a script that grabs two values from this awful HTML published on a city website:
558.35
and
66.0
These are water reservoir details and change weekly.
I'm unsure what the best tool to do this is, grep?
Thanks for your suggestions, ideas!
<table>
<tbody>
<tr>
<td> Currently:</td>
<td> 558.35</td>
</tr>
<tr>
<td> Percent of capacity:</td>
<td> 66.0%</td>
</tr>
</tbody>
</table>
Upvotes: 1
Views: 306
Reputation: 8412
if you are using regex you can use sed
sed -nr 's#^[ ]*<td>.*;[ ]?([0-9]+[.][0-9]+)[%]?</td>[ ]*$#\1#p' my_html_file
An Htmlparser such as python's module BeautifulSoup or a javascript approach is a safer choice
EDIT:
Here is a snippet using javascript..results is logged to the console and an alert box pops up to show results
var values="";
for(i=1;i<document.getElementsByTagName('td').length;++i){
values+=" "+document.getElementsByTagName('td')[i].innerHTML.replace(/ |Percent of capacity:|[ %]/g,"")
}
alert(values);
console.log(values);
<table>
<tbody>
<tr>
<td> Currently:</td>
<td> 558.35</td>
</tr>
<tr>
<td> Percent of capacity:</td>
<td> 66.0%</td>
</tr>
</tbody>
</table>
Upvotes: 2