Reputation: 43
I am a beginner in python and web scraping, I have been scraping data and images successfully from 3 months and just got my first freelance. But this time I am finding hard as the data I am going after is having same div class name as others and I can't figure out how can I possibly try to obtain them specifically.
The Html parsed is as below
<div class="stage-star-main-aside">
<ul class="star-characteristics">
<li class="row is-copy is-bold">
<div class="gr-6">
<span class="is-copy is-std">Country</span>
</div>
<div class="gr-6">
<a href="/en/funnystar?filter%5Bgender%5D=f&filter%5Bcountry%5D=US">United States</a>
</div>
</li>
<li class="row is-copy is-bold">
<div class="gr-6">
<span class="is-copy is-std">Eye color</span>
</div>
<div class="gr-6">
<a href="/en/funnystar?filter%5Bgender%5D=f&filter%5Beyecolor%5D=blue">blue</a>
</div>
</li>
<li class="row is-copy is-bold">
<div class="gr-6">
<span class="is-copy is-std">Hair color</span>
</div>
<div class="gr-6">
<a href="/en/funnystar?filter%5Bgender%5D=f&filter%5Bhaircolor%5D=blonde">blonde</a>
</div>
</li>
<li class="row is-copy is-bold">
<div class="gr-6">
<span class="is-copy is-std">Height</span>
</div>
<div class="gr-6">
<span class="is-copy is-std is-muted">173.0 cm (5'8")</span>
</div>
</li>
<li class="row is-copy is-bold">
<div class="gr-6">
<span class="is-copy is-std">Weight</span>
</div>
<div class="gr-6">
<span class="is-copy is-std is-muted">58 kg (128 lbs)</span>
</div>
</li>
<li class="row is-copy is-bold">
<div class="gr-6">
<span class="is-copy is-std">BMI</span>
</div>
<div class="gr-6">
<span class="is-copy is-std is-muted">19.0 (normal)</span>
</div>
</li>
<a class="add-to-wrapper tc-add-to is-disabled is-small has-text" data-entity-id='{"star_id":"1991"}' data-hover-text="Remove from favorites" data-modal-text="Hot, hot hot! 🙂 If you would like to save this hot pornstar for later, please log in or" data-route="likeStar"
data-type="favourites" href="#" tabindex="-1">
<i class="i-fav i-anim"></i>
<span>
<span class="is-bold">
<span class="is-default-text">
Add to favorites
</span>
<span class="is-added-text">
Is favorite
</span>
</span>
<span class="is-regular">
</span>
</span>
</a>
</div>
I am trying to get country, height, weight, hair color but as it can be seen all of them have the same div class="gr-6". With the code below I get the html but how do I scrape specifically the above data from it?
import requests
from bs4 import BeautifulSoup
url = 'https://egeniotik.com/en/funnystar/ann-ann'
response = requests.get(url)
soup = BeautifulSoup(response.text,'html.parser')
tags = soup.find_all("div", attrs={'class': 'stage-star-main-aside'})
tagsec= tags.find_all("li", attrs={'class': 'row is-copy is-bold'})
On the tagsec line i get the following error
ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
Upvotes: 0
Views: 2701
Reputation: 781098
Loop through the rows. The attribute name is in the first DIV, the value is in the second DIV.
rows = soup.select(".stage-star-main-aside li.row")
for row in rows:
divs = row.find_all("div", class_="gr-6")
attr_name = divs[0].get_text().strip()
attr_value = divs[1].get_text().strip()
print(f"{attr_name} = {attr_value}")
Upvotes: 1