Paul Lena
Paul Lena

Reputation: 43

How to scrape data from a website with same div class names with beautifulsoup?

I am a beginner in python and web scraping, I have been scraping data and images successfully from 3 months and just got my first freelance. But this time I am finding hard as the data I am going after is having same div class name as others and I can't figure out how can I possibly try to obtain them specifically.

The Html parsed is as below

<div class="stage-star-main-aside">
  <ul class="star-characteristics">
    <li class="row is-copy is-bold">
      <div class="gr-6">
        <span class="is-copy is-std">Country</span>
      </div>
      <div class="gr-6">
        <a href="/en/funnystar?filter%5Bgender%5D=f&amp;filter%5Bcountry%5D=US">United States</a>
      </div>
    </li>
    <li class="row is-copy is-bold">
      <div class="gr-6">
        <span class="is-copy is-std">Eye color</span>
      </div>
      <div class="gr-6">
        <a href="/en/funnystar?filter%5Bgender%5D=f&amp;filter%5Beyecolor%5D=blue">blue</a>
      </div>
    </li>
    <li class="row is-copy is-bold">
      <div class="gr-6">
        <span class="is-copy is-std">Hair color</span>
      </div>
      <div class="gr-6">
        <a href="/en/funnystar?filter%5Bgender%5D=f&amp;filter%5Bhaircolor%5D=blonde">blonde</a>
      </div>
    </li>
    <li class="row is-copy is-bold">
      <div class="gr-6">
        <span class="is-copy is-std">Height</span>
      </div>
      <div class="gr-6">
        <span class="is-copy is-std is-muted">173.0 cm (5'8")</span>
      </div>
    </li>
    <li class="row is-copy is-bold">
      <div class="gr-6">
        <span class="is-copy is-std">Weight</span>
      </div>
      <div class="gr-6">
        <span class="is-copy is-std is-muted">58 kg (128 lbs)</span>
      </div>
    </li>
    <li class="row is-copy is-bold">
      <div class="gr-6">
        <span class="is-copy is-std">BMI</span>
      </div>
      <div class="gr-6">
        <span class="is-copy is-std is-muted">19.0 (normal)</span>
      </div>
    </li>
    <a class="add-to-wrapper tc-add-to is-disabled is-small has-text" data-entity-id='{"star_id":"1991"}' data-hover-text="Remove from favorites" data-modal-text="Hot, hot hot! 🙂 If you would like to save this hot pornstar for later, please log in or" data-route="likeStar"
      data-type="favourites" href="#" tabindex="-1">
      <i class="i-fav i-anim"></i>
      <span>
<span class="is-bold">
<span class="is-default-text">
            Add to favorites
          </span>
      <span class="is-added-text">
            Is favorite
          </span>
      </span>
      <span class="is-regular">
</span>
      </span>
    </a>
</div>

I am trying to get country, height, weight, hair color but as it can be seen all of them have the same div class="gr-6". With the code below I get the html but how do I scrape specifically the above data from it?

import requests
from bs4 import BeautifulSoup

url = 'https://egeniotik.com/en/funnystar/ann-ann'

response = requests.get(url)

soup = BeautifulSoup(response.text,'html.parser')
tags = soup.find_all("div", attrs={'class': 'stage-star-main-aside'})
tagsec= tags.find_all("li", attrs={'class': 'row is-copy is-bold'})

On the tagsec line i get the following error

ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

Upvotes: 0

Views: 2701

Answers (1)

Barmar
Barmar

Reputation: 781098

Loop through the rows. The attribute name is in the first DIV, the value is in the second DIV.

rows = soup.select(".stage-star-main-aside li.row")
for row in rows:
    divs = row.find_all("div", class_="gr-6")
    attr_name = divs[0].get_text().strip()
    attr_value = divs[1].get_text().strip()
    print(f"{attr_name} = {attr_value}")

Upvotes: 1

Related Questions