Jake DeVries
Jake DeVries

Reputation: 327

Extract Text from HTML Python (BeautifulSoup, RE, Other Option?)

I am familiar with BeautifulSoup and Regular Expressions as a means of extracting text from HTML but not as familiar with others, such as ElementTree, Minidom, etc.

My question is fairly straightforward. Given the HTML snippet below, which library is best for extracting the text below? The text being the integer.

<td class="tl-cell tl-popularity" data-tooltip="7,944,796" data-tooltip-instant="">
<div class="pop-meter">
<div class="pop-meter-background"></div>
<div class="pop-meter-overlay" style="width: 55%"></div>
</div>
</td>

Upvotes: 0

Views: 1345

Answers (1)

alecxe
alecxe

Reputation: 474061

With BeautifulSoup it is fairly straight-forward:

from bs4 import BeautifulSoup

data = """
<td class="tl-cell tl-popularity" data-tooltip="7,944,796" data-tooltip-instant="">
<div class="pop-meter">
<div class="pop-meter-background"></div>
<div class="pop-meter-overlay" style="width: 55%"></div>
</div>
</td>
"""

soup = BeautifulSoup(data)
print(soup.td['data-tooltip'])

If you have multiple td elements and you need to extract the data-tooltip from each one:

for td in soup.find_all('td', {'data-tooltip': True}):
    print(td['data-tooltip'])

Upvotes: 3

Related Questions