Reputation: 531
I'm trying to strip all HTML tags from the ResultSet of soup.html.body.findAll('td', {'class':'yfnc_h'})
Currently, the ResultSet sometimes contains nested <a href>
, <td>
, and other tags. The only semi-solution I've found which acts upon the ResultSet (not the soup object) is RSelement.string
However, .string
cannot handle inputs with multiple nested tags, e.g.
Input: <td class="yfnc_h" align="right">53.50</td>
Output: 53.50
Input: <td class="yfnc_h" align="right"><b>51.97</b></td>
Output: None
Input: <td class="yfnc_h" align="right"><span id="yfs_c10_djx131116c00100000"> <b style="color:#000000;">0.00</b></span></td>
Output: None
How do I strip all tags from the ResultSet output?
Upvotes: 3
Views: 2274