Reputation: 379
Here is my scenario, I want to get the td
children tags and content in the tr
tags. I'm able to get the content but not the tags, since there are too much elements inside.
The return should be:
p
tag with it's contenttable
elementHTML:
<table>
<tr>
<td>
<!-- first element -->
<p> MY TEXT </p>
<!-- end element -->
</td>
<td>
<!-- second element -->
<table>
<tbody>
<tr>
<td>
<p> MY TEXT </p>
</td>
<td>
<p> MY TEXT </p>
</td>
</tr>
<tr>
<td>
<p> MY TEXT </p>
</td>
</tr>
</tbody>
</table>
<!-- end element -->
</td>
</tr>
</table>
Upvotes: 1
Views: 175
Reputation: 1357
Code:
from bs4 import BeautifulSoup
html = '''
<table>
<tr>
<td>
<!-- first element -->
<p> MY TEXT </p>
<!-- end element -->
</td>
<td>
<!-- second element -->
<table>
<tbody>
<tr>
<td>
<p> MY TEXT </p>
</td>
<td>
<p> MY TEXT </p>
</td>
</tr>
<tr>
<td>
<p> MY TEXT </p>
</td>
</tr>
</tbody>
</table>
<!-- end element -->
</td>
</tr>
</table>
'''
soup = BeautifulSoup(html, 'html.parser')
print("The <p> tag with it's content:")
print(soup.find_all('p'))
print("\nThe <table> element:")
print(soup.find('table').prettify())
Output:
The <p> tag with it's content:
[<p> MY TEXT </p>, <p> MY TEXT </p>, <p> MY TEXT </p>, <p> MY TEXT </p>]
The <table> element:
<table>
<tr>
<td>
<!-- first element -->
<p>
MY TEXT
</p>
<!-- end element -->
</td>
<td>
<!-- second element -->
<table>
<tbody>
<tr>
<td>
<p>
MY TEXT
</p>
</td>
<td>
<p>
MY TEXT
</p>
</td>
</tr>
<tr>
<td>
<p>
MY TEXT
</p>
</td>
</tr>
</tbody>
</table>
<!-- end element -->
</td>
</tr>
</table>
Upvotes: 1