Reputation: 65

extract text from html tags

I Want to get countries names only not the initials, how can i go about it here is the html code

<div class="item_country cell small-4 medium-2 large-2">
<img class="theme-flat" src="/AD/flat/64.png"/>
<p class="mb0 bold">AD</p>
<p>Andorra</p>
</div>, <div class="item_country cell small-4 medium-2 large-2">
<img class="theme-flat" src="/AE/flat/64.png"/>
<p class="mb0 bold">AE</p>
<p>United Arab Emirates</p>

I am getting :

AD
Andorra

AE
United Arab Emirates

instead of:

Andorra
United Arab Emirates

here is my python code

page = requests.get('https://www.countryflags.io')
soup = BeautifulSoup(page.text, 'html.parser')
tables = soup.find_all(class_="item_country cell small-4 medium-2 large-2")
for table in tables:
    country= table.get_text()
    print(country)

Upvotes: 1

Answers (1)

Andrej Kesely

Reputation: 195603

You can use CSS selector .item_country p:nth-of-type(2): that will select second <p> tag under tag with class="item_country":

from bs4 import BeautifulSoup


html_text = '''<div class="item_country cell small-4 medium-2 large-2">
<img class="theme-flat" src="/AD/flat/64.png"/>
<p class="mb0 bold">AD</p>
<p>Andorra</p>
</div>, <div class="item_country cell small-4 medium-2 large-2">
<img class="theme-flat" src="/AE/flat/64.png"/>
<p class="mb0 bold">AE</p>
<p>United Arab Emirates</p>'''

soup = BeautifulSoup(html_text, 'html.parser')

for p in soup.select('.item_country p:nth-of-type(2)'):
    print(p.text)

Prints:

Andorra
United Arab Emirates

If you prefer standard bs4 API:

countries = soup.find_all('div', class_="item_country cell small-4 medium-2 large-2")
for c in countries:
    print(c.find('p', class_="").text)

Upvotes: 1

extract text from html tags

Answers (1)

Related Questions