Reputation: 1139
I am struggling with getting the data I want and I am sure its very simple if you know how to use BS. I have been trying to get this right for hours without avail after reading the docs.
Currently my code outputs this in python:
[<td>0.32%</td>, <td><span class="neg color ">>-0.01</span></td>, <td>0.29%</td>, <td>0.38%</td>, <td><span class="neu">0.00</span></td>]
How would I just isolate the content of the td tags that do not contain the tags?
i.e. I would like to see 0.32%, 0.29%, 0.38% only.
Thank you.
import urllib2
from bs4 import BeautifulSoup
fturl = 'http://markets.ft.com/research/Markets/Bonds'
ftcontent = urllib2.urlopen(fturl).read()
soup = BeautifulSoup(ftcontent)
ftdata = soup.find(name="div", attrs={'class':'wsodModuleContent'}).find_all(name="td", attrs={'class':''})
Upvotes: 0
Views: 188
Reputation: 16371
Is this ok solution for you:
html_txt = """<td>0.32%</td>, <td><span class="neg color">
>-0.01</span></td>, <td>0.29%</td>, <td>0.38%</td>,
<td><span class="neu">0.00</span></td>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_txt)
print [tag.text for tag in soup.find_all('td') if tag.text.strip().endswith("%")]
output is:
[u'0.32%', u'0.29%', u'0.38%']
Upvotes: 2