Reputation: 342
This is just one part of the HTML and there are multiple products on the page with the same HTML construction
I want all the href's for all the products on page
<div class="row product-layout-category product-layout-list">
<div class="product-col wow fadeIn animated" style="visibility: visible;">
<a href="the link I want" class="product-item">
<div class="product-item-image">
<img data-src="link to an image" alt="name of the product" title="name of the product" class="img-responsive lazy" src="link to an image">
</div>
<div class="product-item-desc">
<p><span><strong>brand</strong></span></p>
<p><span class="font-size-16">name of the product</span></p>
<p class="product-item-price>
<span>product price</span></p>
</div>
</a>
</div>
.
.
.
With this code I wrote I only get None printed a bunch of times
from bs4 import BeautifulSoup
import requests
url = 'link to the site'
response = requests.get(url)
page = response.content
soup = BeautifulSoup(page, 'html.parser')
##this includes the part that I gave you
items = soup.find('div', {'class': 'product-layout-category'})
allItems = items.find_all('a')
for n in allItems:
print(n.href)
How can I get it to print all the href's in there?
Upvotes: 0
Views: 540
Reputation: 195408
Looking at your HTML code, you can use CSS selector a.product-item
. This will select all <a>
tags with class="product-item"
:
from bs4 import BeautifulSoup
html_text = """
<div class="row product-layout-category product-layout-list">
<div class="product-col wow fadeIn animated" style="visibility: visible;">
<a href="the link I want" class="product-item">
<div class="product-item-image">
<img data-src="link to an image" alt="name of the product" title="name of the product" class="img-responsive lazy" src="link to an image">
</div>
<div class="product-item-desc">
<p><span><strong>brand</strong></span></p>
<p><span class="font-size-16">name of the product</span></p>
<p class="product-item-price>
<span>product price</span></p>
</div>
</a>
</div>
"""
soup = BeautifulSoup(html_text, "html.parser")
for link in soup.select("a.product-item"):
print(link.get("href")) # or link["href"]
Prints:
the link I want
Upvotes: 1