Reputation: 327
I am familiar with BeautifulSoup and Regular Expressions as a means of extracting text from HTML but not as familiar with others, such as ElementTree, Minidom, etc.
My question is fairly straightforward. Given the HTML snippet below, which library is best for extracting the text below? The text being the integer.
<td class="tl-cell tl-popularity" data-tooltip="7,944,796" data-tooltip-instant="">
<div class="pop-meter">
<div class="pop-meter-background"></div>
<div class="pop-meter-overlay" style="width: 55%"></div>
</div>
</td>
Upvotes: 0
Views: 1345
Reputation: 474061
With BeautifulSoup
it is fairly straight-forward:
from bs4 import BeautifulSoup
data = """
<td class="tl-cell tl-popularity" data-tooltip="7,944,796" data-tooltip-instant="">
<div class="pop-meter">
<div class="pop-meter-background"></div>
<div class="pop-meter-overlay" style="width: 55%"></div>
</div>
</td>
"""
soup = BeautifulSoup(data)
print(soup.td['data-tooltip'])
If you have multiple td
elements and you need to extract the data-tooltip
from each one:
for td in soup.find_all('td', {'data-tooltip': True}):
print(td['data-tooltip'])
Upvotes: 3