Sharanabasu Angadi
Sharanabasu Angadi

Reputation: 4372

Read rows of only outer table using BeautifulSoup

  <table>
  <tbody>
    <tr>
        <td>Some Content </td>
        <td>Some Content </td>
    </tr>
    <tr>
        <td>Some Content </td>
        <td>Some Content </td>
    </tr>
    <tr>
        <table>
            <tbody>
                <tr>
                    <td>Some Content </td>
                    <td>Some Content </td>
                </tr>
                <tr>
                    <td>Some Content </td>
                    <td>Some Content </td>
                </tr>
                <tr>
                    <td>Some Content </td>
                    <td>Some Content </td>
                </tr>
            </tbody>
        </table>
    </tr>
  </tbody>
<table>

I have some HTML and has table content like above. In side a table there are some more tables.

when I read tr using beautifulsoup like

table_grid_1 = soup.find("table", {"id": "GridView1"})
rows = table_grid_1.find("tbody").find_all("tr")

rows of inner table also getting read.

i.e `print "length " + str(len(rows))` prints 5. but I want to read tr of only outer table like size should be 3

How can I read rows of only outer table?

Upvotes: 2

Views: 336

Answers (2)

DJanssens
DJanssens

Reputation: 20709

You can achieve this by using the recursive=False parameter as follows:

soup = BeautifulSoup(html)
table_grid_1 = soup.find("table", {"id": "GridView1"})
rows = table_grid_1.find("tbody").find_all("tr",recursive=False)
print len(rows)

which returns 3.

Upvotes: 2

mmachine
mmachine

Reputation: 926

You may try:

[x.string for x in soup.select('table > tbody > tr > td') if x not in soup.select('table > tbody > tr > table > tbody > tr > td')]

Result: only outer table td content. Note: it will return empty list if outer and inner td equal.

Upvotes: 0

Related Questions