Lukemul69
Lukemul69

Reputation: 187

Extracting multiple strings from different elements with the same class bs4 beautifulSoup

I am trying to scrape data from a site that has different div but the same class name.

<div class="release-date-text-wrapper" >
                        <div class='release-date-title'><a href="/pharrell-x-adidas-nmd-hu-sesame">Pharrell x adidas NMD Hu Sesame</a></div>
                        <div class='release-date-style'>Sesame/Sand-Bright Red</div>
                    </div>
                </div>
            </div>
                                <div class='col-xs-6 col-sm-3 col-md-3 release-date-item-continer clear-padding'>
                <div class='release-date-item-wrapper'>
                    <div class="release-event-date-wrapper">
                        <div class="event-date ">
                            <div>
                                25&nbsp;Oct                            </div>
                            <div>2020</div>
                        </div>
                    </div>
                    <div class='release-date-image-wrapper'>
                                                <a href="/pharrell-x-adidas-nmd-hu-crystal-white" class='thumbnail'>
                                                        <img  src="https://4app.kicksonfire.com/kofapp/upload/events_master_images/thumb_ipad_pharrell-x-adidas-nmd-hu-crystal-white.jpg" alt="Pharrell x adidas NMD Hu Crystal White" class="img-responsive imagecache imagecache-kofapp_list"  width="250" height="200" />
                        </a>
                    </div>
                    <div class="release-date-text-wrapper" >
                        <div class='release-date-title'><a href="/pharrell-x-adidas-nmd-hu-crystal-white">Pharrell x adidas NMD Hu Crystal White</a></div>
                        <div class='release-date-style'>Crystal White/Clear Mint-Shock Yellow</div>

I am trying to pull the 'release-date-title' from the 2 divs they should show as the following

Pharrell x adidas NMD Hu Sesame
Pharrell x adidas NMD Hu Crystal White

Here is current code I use.

Name = soup.find('div',attrs={'class':'release-date-title'}).text

This gives me the first one no problem, the trouble I am having is getting the 2nd one. I tried .find_next("div") but it showed the class 'release-date-style'.

EDIT: I need to be able to select them individually as later on I will be adding them to a discord embed adding colors and dates to each title.

Upvotes: 0

Views: 345

Answers (2)

Halmon
Halmon

Reputation: 1077

Instead of using soup.find which returns the single item, use soup.findAll which returns a list of all matching results. That way you can iterate through the results to get what you need.

Names = soup.findAll('div',attrs={'class':'release-date-title'})
for name in Names:
    print(name.text)

prints:

Pharrell x adidas NMD Hu Sesame
Pharrell x adidas NMD Hu Crystal White

Per our comments below here is how you would print result #4:

Names = soup.findAll('div',attrs={'class':'release-date-title'})
print(Names[3].text) #Change this index to get the result you want, right now it is 3 since you wanted result #4

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195438

If html_doc is your HTML snippet from the question, then this script:

soup = BeautifulSoup(html_doc, 'html.parser')

for t in soup.select('.release-date-text-wrapper > div:nth-child(1)'):
    print(t.text)

prints:

Pharrell x adidas NMD Hu Sesame
Pharrell x adidas NMD Hu Crystal White

CSS selector .release-date-text-wrapper > div:nth-child(1) will select first <div> child directly under element with class="release-date-text-wrapper"


Or:

for t in soup.select('.release-date-title > a'):
    print(t.text)

Or:

for t in soup.select('.release-date-title'):
    print(t.text)

Upvotes: 0

Related Questions