Bora Varol
Bora Varol

Reputation: 342

Get href links from a tag

This is just one part of the HTML and there are multiple products on the page with the same HTML construction

I want all the href's for all the products on page

<div class="row product-layout-category product-layout-list">
    <div class="product-col wow fadeIn animated" style="visibility: visible;">
        <a href="the link I want" class="product-item">
            <div class="product-item-image">
                <img data-src="link to an image" alt="name of the product" title="name of the product" class="img-responsive lazy" src="link to an image">
            </div>
            <div class="product-item-desc">
                <p><span><strong>brand</strong></span></p>                                            
                <p><span class="font-size-16">name of the product</span></p>
                <p class="product-item-price>
                    <span>product price</span></p>
            </div>
        </a>
    </div>
.
.
.

With this code I wrote I only get None printed a bunch of times

from bs4 import BeautifulSoup
import requests

url = 'link to the site'
response = requests.get(url)

page = response.content

soup = BeautifulSoup(page, 'html.parser')


##this includes the part that I gave you
items = soup.find('div', {'class': 'product-layout-category'})

allItems = items.find_all('a')

for n in allItems:
    print(n.href)

How can I get it to print all the href's in there?

Upvotes: 0

Views: 540

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195408

Looking at your HTML code, you can use CSS selector a.product-item. This will select all <a> tags with class="product-item":

from bs4 import BeautifulSoup


html_text = """
<div class="row product-layout-category product-layout-list">
    <div class="product-col wow fadeIn animated" style="visibility: visible;">
        <a href="the link I want" class="product-item">
            <div class="product-item-image">
                <img data-src="link to an image" alt="name of the product" title="name of the product" class="img-responsive lazy" src="link to an image">
            </div>
            <div class="product-item-desc">
                <p><span><strong>brand</strong></span></p>
                <p><span class="font-size-16">name of the product</span></p>
                <p class="product-item-price>
                    <span>product price</span></p>
            </div>
        </a>
    </div>
"""

soup = BeautifulSoup(html_text, "html.parser")

for link in soup.select("a.product-item"):
    print(link.get("href")) # or link["href"]

Prints:

the link I want

Upvotes: 1

Related Questions