Reputation: 2781
I know Regular Expression is not right track to do this parsing job but it is recommended from my side.
If i have a HTML this below. I want to parse all the select info from html table. For this i have used
<table id='options_table'>\s*?(.+)?\s*?</table>
But this above giving me null result.
and then to parse all select returned from above regex i will use
<SELECT.*?>(.*?)<\/SELECT>
But above both getting null result.
What should be the regex for Table and Select (from parsed table html) ?
HTML Part
<table id='options_table'>
<tr><td colspan=3><font size="3" class="colors_productname">
<i><b>Color</b></i>
</font>
<br /><table cellpadding="0" cellspacing="0" border="0"><tr><td><img class="vCSS_img_line_group_features" src="/v/vspfiles/templates/192/images/Line_Group_Features.gif" /></td></tr></table>
</font></td></tr>
<tr>
<td align="right" vAlign="top">
<img src="/v/vspfiles/templates/192/images/clear1x1.gif" width="1" height="4" border="0"><br />
</td><td></td><td>
<SELECT name="SELECT___S15FTAN01___29" onChange="change_option('SELECT___S15FTAN01___29',this.options[this.selectedIndex].value)">
<OPTION value="176" >Ivory/Grey</OPTION>
</SELECT>
</td></tr>
<tr>
<td align="right" vAlign="top">
<img src="/v/vspfiles/templates/192/images/clear1x1.gif" width="1" height="4" border="0"><br />
</td><td></td><td>
<SELECT name="SELECT___S15FTAN01___31" onChange="change_option('SELECT___S15FTAN01___31',this.options[this.selectedIndex].value)">
<OPTION value="167" >0/3 months</OPTION>
<OPTION value="169" >3/6 months</OPTION>
<OPTION value="175" >6/9 months</OPTION>
</SELECT>
</td></tr>
</table>
Upvotes: 0
Views: 1203
Reputation: 1940
I don't know, GoLang, but I can tell you in perl, and I think you will be able to relate to GoLang.
Firstly, regex to store table tag
content (https://regex101.com/r/tL7dA0/1):
$table = $1 if ($html =~ m/<table.*?>(.*)<\/table>/igs);
Regex for printing all the things between select
tag (https://regex101.com/r/xJ0xU1/1):
while ($table =~ m/<select.*?>(.*?)<\/select>/isg){
print $1."\n";
}
As in your case, if html table contains inner table, then all the content of outer table would be selected.
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
s modifier: single line. Dot matches newline characters
g modifier: global. All matches (don't return on first match)
Upvotes: 1