Jayson
Jayson

Reputation: 2031

What regex would match a nested table with identifiable text in the table cell?

What regex would match a nested table with identifiable text in the table cell? I've tried but failed to come up with a regular expression to extract the specific table I want with out grabbing the beginning and end of both tables in the example. Here is something to get started: "<table>.*?</table>"

<table>
    <tr>
        <td>
            <table>
                <tr><td>Code1</td></tr>
                <tr><td>some data</td></tr>
                <tr><td>etc ...</td></tr>
            </table>
        </td>
    </tr>
    <tr>
        <td>
            <table>
                <tr><td>Code2</td></tr>
                <tr><td>some data</td></tr>
                <tr><td>etc ...</td></tr>
            </table>
        </td>
    </tr>
</table>

Say I want to extract the table containing "Code2". What regex will match specifically and only that table?

Upvotes: 1

Views: 4558

Answers (3)

tangens
tangens

Reputation: 39743

The following regex will find your table:

(?ms)<table>((?!<table>).)*<td>Code2</td>.*?</table>

With (?ms) you turn on "multiline matches" (m) and "dot matches newlines, too" (s). Then you have a negative lookahead (?!) to make sure you have no second start of a table inside your match.

Upvotes: 4

tster
tster

Reputation: 18247

Don't use a regex. Use an HTML parser!

However, in Perl (assuming you don't have nested tables):

$xml =~ /<table>.*<td>Code2<\/td>.*<\/table>/s;

Upvotes: 1

Brian Agnew
Brian Agnew

Reputation: 272347

I wouldn't use a regexp on this, since HTML isn't regular, and there are no end of edge cases to trip you up. You're better off using an HTML parser. Whichever language or platform you're using, there'll be one available.

Upvotes: 6

Related Questions