Arjun Nayini
Arjun Nayini

Reputation: 191

Python and BeautifulSoup for parsing HTML

I've gotten to the point where I have the HTML, but I'd like to extract just one string out of it

There is a line in each HTML file that looks like this

<h4 class="ws-ds-name detail-title">DATA_I_WANT</h4> 

And I'm not sure how to use the .find() method to get exactly that tag and then extract out the DATA_I_WANT

Any suggestions?

Thanks

Upvotes: 1

Views: 363

Answers (1)

mechanical_meat
mechanical_meat

Reputation: 169494

from BeautifulSoup import BeautifulSoup as bs
markup = ''' some HTML here '''
soup = bs(markup)
soup.find('h4', {'class':'ws-ds-name detail-title'}).contents[0]
# result: 
# u'DATA_I_WANT'

Or you could use lxml:

from lxml.html import fromstring
doc = fromstring(markup)
doc.xpath('//h4[@class="ws-ds-name detail-title"]')[0].text
# result: 
# 'DATA_I_WANT'

Upvotes: 1

Related Questions