How to handle td data without tag in beautifulsoup?

Question

Here is some data:



AAPL


Apple Inc.






1.65%


$153.18


$2.52


2017-05-11


2017-05-18

I need to get the values of 1.65%, $153.18 and $2.52. They are all by themselves on one line without tag.

This code returns nothing. How can I get around this? Thanks.

import requests
from bs4 import BeautifulSoup
url = "http://www.dividend.com/dividend-stocks/dow-30-dividend-stocks.php"
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")

for tds in soup.find_all("td"):
   print(tds)

wpedrak · Accepted Answer

I find out that html.parser is not the best choice in this case. Let's try html5lib instead. Type (linux)

sudo apt-get install python-html5lib

to install new parser. Link to BF+html5lib docs.

This is working code (for printing text of mentioned tds):

import requests
from bs4 import BeautifulSoup
url = "http://www.dividend.com/dividend-stocks/dow-30-dividend-stocks.php"
r = requests.get(url)
soup = BeautifulSoup(r.content, "html5lib")
interesting_tds = ['Dividend Yield', 'Closing Price', 'Annualized Dividend']

for td in soup.find_all("td"):
    if td.get('data-th') in interesting_tds:
        print(td.text.strip())
        # or just process td object

How to handle td data without tag in beautifulsoup?

This code returns nothing. How can I get around this? Thanks.

Answers (1)

Related Questions