Reputation: 9
I know there are similar questions to this one that are answered which I already tried applying and didn't fix my problem.
My problem is that on this website: http://books.toscrape.com/catalogue/page-1.html there are 20 prices and when I try to scrape the prices, I only get the first price but not other 19.
Here's the code
from bs4 import BeautifulSoup
import requests
URL = 'http://books.toscrape.com/catalogue/page-1.html'
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find_all("div", class_ = "col-sm-8 col-md-9")
for i in results :
prices = i.find("p", class_ = "price_color")
print(prices.text.strip())
print()
Upvotes: 0
Views: 217
Reputation:
Wrong class. @ihonestlydontKnow, if you change this line to "article", your code will work:
results = soup.find_all("article")
(as furas mentioned in his reply)
**print(results) (or using https://codebeautify.org/htmlviewer for checking the structure.)
....
<article class="product_pod">
<div class="image_container">
<a href="libertarianism-for-beginners_982/index.html"><img alt="Libertarianism for Beginners" class="thumbnail" src="../media/cache/0b/bc/0bbcd0a6f4bcd81ccb1049a52736406e.jpg"/></a>
</div>
<p class="star-rating Two">
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
</p>
<h3><a href="libertarianism-for-beginners_982/index.html" title="Libertarianism for Beginners">Libertarianism for Beginners</a></h3>
<div class="product_price">
<p class="price_color">£51.33</p>
<p class="instock availability">
<i class="icon-ok"></i>
In stock
</p>
<form>
<button class="btn btn-primary btn-block" data-loading-text="Adding..." type="submit">Add to basket</button>
</form>
</div>
</article>
****output
£51.77
£53.74
£50.10
£47.82
£54.23
£22.65
£33.34
£17.93
...
(vwebtuan) tng@rack-dff0:~$ cat a.py
from bs4 import BeautifulSoup
import requests
URL = 'http://books.toscrape.com/catalogue/page-1.html'
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find_all("article")
#print(results)
for i in results :
prices = i.find("p", class_ = "price_color")
print(prices.text.strip())
(vwebtuan) tng@rack-dff0:~$
Upvotes: 0
Reputation: 142651
You search items in wrong way.
There is only one div
with col-sm-8 col-md-9
with many prices
but your code expects many divs
with single price in every div
- and this makes problem.
Using find()
you search single price in this div
but you should use find_all
to get all prices in this single div
.
div = soup.find("div", class_="col-sm-8 col-md-9")
prices = div.find_all("p", class_="price_color")
for i in prices:
print(i.text.strip())
You could even search directly prices
prices = soup.find_all("p", class_="price_color")
for i in prices:
print(i.text.strip())
Minimal working example:
from bs4 import BeautifulSoup
import requests
url = 'http://books.toscrape.com/catalogue/page-1.html'
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
div = soup.find("div", class_="col-sm-8 col-md-9")
prices = soup.find_all("p", class_="price_color")
for i in prices:
print(i.text.strip())
Using find()
to search price could work only if you would first find all regions with single price - like article
.
Every book is in separated article
- so there are many articles
and every article
has single price (and single title, single image, etc.)
from bs4 import BeautifulSoup
import requests
url = 'http://books.toscrape.com/catalogue/page-1.html'
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
results = soup.find_all("article")
for i in results:
title = i.find("h3")
print('title:', title.text.strip())
price = i.find("p", class_="price_color")
print('price:', price.text.strip())
print('---')
Result:
title: A Light in the ...
price: £51.77
---
title: Tipping the Velvet
price: £53.74
---
title: Soumission
price: £50.10
---
title: Sharp Objects
price: £47.82
---
title: Sapiens: A Brief History ...
price: £54.23
---
title: The Requiem Red
price: £22.65
---
title: The Dirty Little Secrets ...
price: £33.34
---
title: The Coming Woman: A ...
price: £17.93
---
title: The Boys in the ...
price: £22.60
---
title: The Black Maria
price: £52.15
---
title: Starving Hearts (Triangular Trade ...
price: £13.99
---
title: Shakespeare's Sonnets
price: £20.66
---
title: Set Me Free
price: £17.46
---
title: Scott Pilgrim's Precious Little ...
price: £52.29
---
title: Rip it Up and ...
price: £35.02
---
title: Our Band Could Be ...
price: £57.25
---
title: Olio
price: £23.88
---
title: Mesaerion: The Best Science ...
price: £37.59
---
title: Libertarianism for Beginners
price: £51.33
---
title: It's Only the Himalayas
price: £45.17
---
Upvotes: 1
Reputation: 96
this code should work!
import requests
from bs4 import BeautifulSoup
URL = 'http://books.toscrape.com/catalogue/page-1.html'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
list_of_books = soup.select(
# using chrom selector
'#default > div > div > div > div > section > div:nth-child(2) > ol > li'
)
for book in list_of_books:
price = book.find('p', {'class': 'price_color'})
print(price.text.strip())
i just used chorme selector this is a screenshot of it
you are using the find
and find_all
in the wrong places.
Upvotes: 0