mumer91
mumer91

Reputation: 133

Ignore first of the two divs with same class in BeautifulSoup

I want to scrape a few URL that have 2 divs using same class="description",

The source code of a sample URL is like this:

<!-- Initial HTML here -->

<div class="description">
<h4> Anonymous Title </h4>
<div class="product-description">
<li> Some stuff here </li>
</div>
</div>

<!-- Middle HTML here -->

<div class="description">
Some text here
</div>

<!-- Last HTML here -->

I'm scraping it using BeautifulSoap using following script

# imports etc here
description_box = soup.find('div', attrs={'class': 'description'})
description = description_box.text.strip()
print description

Running it gives me the first div with class="description" only however I want the second div with class="description" only.

Any ideas how I can ignore the first div and just scrape the second?

P.S. First div always have h4 tags and second div only has plain text in between tags.

Upvotes: 0

Views: 2196

Answers (3)

QHarr
QHarr

Reputation: 84465

You can use type with class selector in css and index into returned collection

print(soup.select('div.description')[1].text)

Upvotes: 0

Rocky Li
Rocky Li

Reputation: 5958

Use css-selector as it contains the nth-of-type attribute to select the nth element of your specification. Also, syntax is cleaner.

description_box = soup.select("div.description:nth-of-type(2)")[0]

Upvotes: 0

chitown88
chitown88

Reputation: 28585

If you do .find_all, it'll return all in a list. It's then just a matter of selecting the 2nd item in that list using index 1:

html = '''<!-- Initial HTML here -->

<div class="description">
<h4> Anonymous Title </h4>
<div class="product-description">
<li> Some stuff here </li>
</div>
</div>

<!-- Middle HTML here -->

<div class="description">
Some text here
</div>

<!-- Last HTML here -->'''

soup = BeautifulSoup(html, 'html.parser')
divs = soup.find_all('div', {'class':'description'})
div = divs[1]

Output:

print (div)
<div class="description">
Some text here
</div>

Upvotes: 2

Related Questions