Beautiful Soup HTML Extraction

Question

I am struggling with getting the data I want and I am sure its very simple if you know how to use BS. I have been trying to get this right for hours without avail after reading the docs.

Currently my code outputs this in python:

[0.32%, >-0.01, 0.29%, 0.38%, 0.00]

How would I just isolate the content of the td tags that do not contain the tags?

i.e. I would like to see 0.32%, 0.29%, 0.38% only.

Thank you.

import urllib2
from bs4 import BeautifulSoup

fturl = 'http://markets.ft.com/research/Markets/Bonds'
ftcontent = urllib2.urlopen(fturl).read()
soup = BeautifulSoup(ftcontent)

ftdata = soup.find(name="div", attrs={'class':'wsodModuleContent'}).find_all(name="td",       attrs={'class':''})

Robert Lujo · Accepted Answer

Is this ok solution for you:

html_txt = """0.32%, 
    >-0.01, 0.29%, 0.38%, 
    0.00
    """
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_txt)
print [tag.text for tag in soup.find_all('td') if tag.text.strip().endswith("%")]

output is:

[u'0.32%', u'0.29%', u'0.38%']

Beautiful Soup HTML Extraction

Answers (1)

Related Questions