Jack Huang
Jack Huang

Reputation: 531

BeautifulSoup: stripping HTML tags from findAll ResultSet

I'm trying to strip all HTML tags from the ResultSet of soup.html.body.findAll('td', {'class':'yfnc_h'})

Currently, the ResultSet sometimes contains nested <a href>, <td>, and other tags. The only semi-solution I've found which acts upon the ResultSet (not the soup object) is RSelement.string

However, .string cannot handle inputs with multiple nested tags, e.g.

Input: <td class="yfnc_h" align="right">53.50</td>

Output: 53.50

Input: <td class="yfnc_h" align="right"><b>51.97</b></td>

Output: None

Input: <td class="yfnc_h" align="right"><span id="yfs_c10_djx131116c00100000"> <b style="color:#000000;">0.00</b></span></td>

Output: None

How do I strip all tags from the ResultSet output?

Upvotes: 3

Views: 2274

Answers (1)

TerryA
TerryA

Reputation: 59974

Use the .text attribute instead:

print RSelement.text

Upvotes: 3

Related Questions