Briefbreaddd
Briefbreaddd

Reputation: 379

How can I get td contents with tags in beautifulsoup?

Here is my scenario, I want to get the td children tags and content in the tr tags. I'm able to get the content but not the tags, since there are too much elements inside.

The return should be:

  1. The p tag with it's content
  2. The table element

HTML:

<table>
    <tr>

        <td>
        <!-- first element -->
            <p> MY TEXT </p>
        <!-- end element -->
        </td>

        <td>
        <!-- second element -->
            <table>
                <tbody>
                    <tr>
                        <td>
                            <p> MY TEXT </p>
                        </td>
                        <td>
                            <p> MY TEXT </p>
                        </td>
                    </tr>
                    <tr>
                        <td>
                            <p> MY TEXT </p>
                        </td>
                    </tr>
                </tbody>
            </table>
        <!-- end element -->
        </td>

    </tr>
</table>

Upvotes: 1

Views: 175

Answers (1)

Ali
Ali

Reputation: 1357

Code:

from bs4 import BeautifulSoup

html = '''
<table>
    <tr>
        <td>
        <!-- first element -->
            <p> MY TEXT </p>
        <!-- end element -->
        </td>
        <td>
        <!-- second element -->
            <table>
                <tbody>
                    <tr>
                        <td>
                            <p> MY TEXT </p>
                        </td>
                        <td>
                            <p> MY TEXT </p>
                        </td>
                    </tr>
                    <tr>
                        <td>
                            <p> MY TEXT </p>
                        </td>
                    </tr>
                </tbody>
            </table>
        <!-- end element -->
        </td>
    </tr>
</table>
'''

soup = BeautifulSoup(html, 'html.parser')
print("The <p> tag with it's content:")
print(soup.find_all('p'))
print("\nThe <table> element:")
print(soup.find('table').prettify())

Output:

The <p> tag with it's content:
[<p> MY TEXT </p>, <p> MY TEXT </p>, <p> MY TEXT </p>, <p> MY TEXT </p>]

The <table> element:
<table>
 <tr>
  <td>
   <!-- first element -->
   <p>
    MY TEXT
   </p>
   <!-- end element -->
  </td>
  <td>
   <!-- second element -->
   <table>
    <tbody>
     <tr>
      <td>
       <p>
        MY TEXT
       </p>
      </td>
      <td>
       <p>
        MY TEXT
       </p>
      </td>
     </tr>
     <tr>
      <td>
       <p>
        MY TEXT
       </p>
      </td>
     </tr>
    </tbody>
   </table>
   <!-- end element -->
  </td>
 </tr>
</table>

Upvotes: 1

Related Questions