Reputation: 187
I am trying to scrape data from a site that has different div but the same class name.
<div class="release-date-text-wrapper" >
<div class='release-date-title'><a href="/pharrell-x-adidas-nmd-hu-sesame">Pharrell x adidas NMD Hu Sesame</a></div>
<div class='release-date-style'>Sesame/Sand-Bright Red</div>
</div>
</div>
</div>
<div class='col-xs-6 col-sm-3 col-md-3 release-date-item-continer clear-padding'>
<div class='release-date-item-wrapper'>
<div class="release-event-date-wrapper">
<div class="event-date ">
<div>
25 Oct </div>
<div>2020</div>
</div>
</div>
<div class='release-date-image-wrapper'>
<a href="/pharrell-x-adidas-nmd-hu-crystal-white" class='thumbnail'>
<img src="https://4app.kicksonfire.com/kofapp/upload/events_master_images/thumb_ipad_pharrell-x-adidas-nmd-hu-crystal-white.jpg" alt="Pharrell x adidas NMD Hu Crystal White" class="img-responsive imagecache imagecache-kofapp_list" width="250" height="200" />
</a>
</div>
<div class="release-date-text-wrapper" >
<div class='release-date-title'><a href="/pharrell-x-adidas-nmd-hu-crystal-white">Pharrell x adidas NMD Hu Crystal White</a></div>
<div class='release-date-style'>Crystal White/Clear Mint-Shock Yellow</div>
I am trying to pull the 'release-date-title' from the 2 divs they should show as the following
Pharrell x adidas NMD Hu Sesame
Pharrell x adidas NMD Hu Crystal White
Here is current code I use.
Name = soup.find('div',attrs={'class':'release-date-title'}).text
This gives me the first one no problem, the trouble I am having is getting the 2nd one. I tried .find_next("div") but it showed the class 'release-date-style'.
EDIT: I need to be able to select them individually as later on I will be adding them to a discord embed adding colors and dates to each title.
Upvotes: 0
Views: 345
Reputation: 1077
Instead of using soup.find which returns the single item, use soup.findAll which returns a list of all matching results. That way you can iterate through the results to get what you need.
Names = soup.findAll('div',attrs={'class':'release-date-title'})
for name in Names:
print(name.text)
prints:
Pharrell x adidas NMD Hu Sesame
Pharrell x adidas NMD Hu Crystal White
Per our comments below here is how you would print result #4:
Names = soup.findAll('div',attrs={'class':'release-date-title'})
print(Names[3].text) #Change this index to get the result you want, right now it is 3 since you wanted result #4
Upvotes: 1
Reputation: 195438
If html_doc
is your HTML snippet from the question, then this script:
soup = BeautifulSoup(html_doc, 'html.parser')
for t in soup.select('.release-date-text-wrapper > div:nth-child(1)'):
print(t.text)
prints:
Pharrell x adidas NMD Hu Sesame
Pharrell x adidas NMD Hu Crystal White
CSS selector .release-date-text-wrapper > div:nth-child(1)
will select first <div>
child directly under element with class="release-date-text-wrapper"
Or:
for t in soup.select('.release-date-title > a'):
print(t.text)
Or:
for t in soup.select('.release-date-title'):
print(t.text)
Upvotes: 0