Reputation: 18830
I am trying to parse a html page, I have successfully got to the sub area of the tree of the html dom but I am stuck in a place where there are span tags.
example: I initially parse the page as follows:
user_url = base_url + str(user_id) + "/" + display_name
user_page = urllib2.urlopen(user_url)
souping_page = bs(user_page)
badges = souping_page.body.find('div', attrs={'class': 'badges'})
badges will give me following:
<span><span title="3 gold badges"><span class="badge1"></span><span class="badgecount">3</span></span><span title="23 silver badges"><span class="badge2"></span><span class="badgecount">23</span></span><span title="43 bronze badges"><span class="badge3"></span><span class="badgecount">43</span></span></span>
But I am trying to extract <span title="3 gold badges">
and all the other span title
attributes by traversing the dom structure. How can I do that in beautifulsoup.
Upvotes: 0
Views: 205
Reputation: 28312
You can simply do this:
>>> badges.span.span
<span title="3 gold badges"><span class="badge1"></span><span class="badgecount">3</span></span>
Upvotes: 4