Naresh
Naresh

Reputation: 2781

Regular expression to parse Table and Select from HTML

I know Regular Expression is not right track to do this parsing job but it is recommended from my side.

If i have a HTML this below. I want to parse all the select info from html table. For this i have used

<table id='options_table'>\s*?(.+)?\s*?</table>

But this above giving me null result.

and then to parse all select returned from above regex i will use

<SELECT.*?>(.*?)<\/SELECT>

But above both getting null result.

What should be the regex for Table and Select (from parsed table html) ?

HTML Part

<table id='options_table'>
    <tr><td colspan=3><font size="3" class="colors_productname">
    <i><b>Color</b></i>
    </font>
    <br /><table cellpadding="0" cellspacing="0" border="0"><tr><td><img class="vCSS_img_line_group_features" src="/v/vspfiles/templates/192/images/Line_Group_Features.gif" /></td></tr></table>
    </font></td></tr>
    <tr>
    <td align="right" vAlign="top">
    <img src="/v/vspfiles/templates/192/images/clear1x1.gif" width="1" height="4" border="0"><br />
    </td><td></td><td>
    <SELECT name="SELECT___S15FTAN01___29" onChange="change_option('SELECT___S15FTAN01___29',this.options[this.selectedIndex].value)">
    <OPTION value="176" >Ivory/Grey</OPTION>
    </SELECT>&nbsp;&nbsp;
    </td></tr>
    <tr>
    <td align="right" vAlign="top">
    <img src="/v/vspfiles/templates/192/images/clear1x1.gif" width="1" height="4" border="0"><br />
    </td><td></td><td>
    <SELECT name="SELECT___S15FTAN01___31" onChange="change_option('SELECT___S15FTAN01___31',this.options[this.selectedIndex].value)">
    <OPTION value="167" >0/3 months</OPTION>
    <OPTION value="169" >3/6 months</OPTION>
    <OPTION value="175" >6/9 months</OPTION>
    </SELECT>&nbsp;&nbsp;
    </td></tr>
    </table>

Upvotes: 0

Views: 1203

Answers (1)

Kamal Nayan
Kamal Nayan

Reputation: 1940

I don't know, GoLang, but I can tell you in perl, and I think you will be able to relate to GoLang.
Firstly, regex to store table tag content (https://regex101.com/r/tL7dA0/1):

$table = $1 if ($html =~ m/<table.*?>(.*)<\/table>/igs);

Regex for printing all the things between select tag (https://regex101.com/r/xJ0xU1/1):

 while ($table =~ m/<select.*?>(.*?)<\/select>/isg){
            print $1."\n";
        }

As in your case, if html table contains inner table, then all the content of outer table would be selected.

i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
s modifier: single line. Dot matches newline characters
g modifier: global. All matches (don't return on first match)

Upvotes: 1

Related Questions