Pixy
Pixy

Reputation: 61

Regex: negative look-ahead with repeated HTML tag (nested tables)

I want to match the range between an HTML opening tag <table and a closing tag </table> as long as there is not any opening tag <table inbetween. Example:

<html><body><title>1</title><br><table width=300 height=35 border=0 cellspacing=0 cellpadding=0><tr><td align=center width=50><table width=300 border=0 cellspacing=0 cellpadding=0><tr><td>A</td></tr></table><tr><td><br>B</td></tr></table>
    <br><td><tr></td></tr><table width=300 height=35 border=0 cellspacing=0 cellpadding=0><tr><td align=center width=50><table width=300 border=0 cellspacing=0 cellpadding=0><tr><td>C</td></tr></table><tr><td><br>D</td></tr></table>
    <td><tr></td></tr><br><table width=300 height=35 border=0 cellspacing=0 cellpadding=0><tr><td align=center width=50><table width=300 border=0 cellspacing=0 cellpadding=0><tr><td>E</td></tr></table><tr><td><br>F</td></tr></table>

Each <table which is eventually followed by another <table before reaching </table should be disregarded. Consequently, the first match should be:

<table width=300 border=0 cellspacing=0 cellpadding=0><tr><td>A</td></tr></table>

The match should not be:

<table width=300 height=35 border=0 cellspacing=0 cellpadding=0><tr><td align=center width=50><table width=300 border=0 cellspacing=0 cellpadding=0><tr><td>A</td></tr></table>

...and so on.

Regex tried: (<table(?!<table)(.+?)</table>)

Upvotes: 0

Views: 103

Answers (0)

Related Questions