Sara Santana
Sara Santana

Reputation: 1019

How can I extract data from a html tag using Python?

I want to extract the translation of a word in online dictionary. For example, the html code for 'car':

<ol class="sense_list level_1">
     <li class="sense_list_item level_1" value="1"><span class="def">any vehicle on wheels</span></li>

How can I extract "any vehicle on wheels" in Python with beautifulsoup or any other modules?

Upvotes: 0

Views: 1819

Answers (3)

alecxe
alecxe

Reputation: 473803

There are multiple ways to reach the desired element.

Probably the simplest would be to find it by class:

soup.find('span', class_='def').text

or, with a CSS selector:

soup.select('span.def')[0].text

or, additionally checking the parents:

soup.select('ol.level_1 > li.level_1 > span.def')[0].text

or:

soup.select('ol.level_1 > li[value=1] > span.def')[0].text

Upvotes: 1

Sara Santana
Sara Santana

Reputation: 1019

I solve it by beautifulsoup:

soup = bs4.BeautifulSoup(html)
q1=soup.find('li', class_="sense_list_item level_1",value='1').text

Upvotes: 1

FTA
FTA

Reputation: 345

Assuming that is the only HTML code given, you can use NLTK.

import nltk 

#load html chunk into variable htmlstring#
extract = nltk.clean_html(htmlstring)
print(extract)

Upvotes: 0

Related Questions