Reputation: 109
I'm trying to get the adidas shoe link from a search page, can't figure it out what I'm doing wrong.
I tried tags = soup.find("section", {"class": "productList"}).findAll("a")
Doesnt work :(
I also tried to print all href
and the desired link is not in there :(
So I'm expecting to print this :
https://www.tennisexpress.com/adidas-mens-adizero-ubersonic-50-yrs-ltd-tennis-shoes-off-white-and-signal-blue-62138
from bs4 import BeautifulSoup
import requests
url = "https://www.tennisexpress.com/search.cfm?searchKeyword=BB6892"
# Getting the webpage, creating a Response object.
response = requests.get(url)
# Extracting the source code of the page.
data = response.text
# Passing the source code to BeautifulSoup to create a BeautifulSoup object for it.
soup = BeautifulSoup(data, 'lxml')
# Extracting all the <a> tags into a list.
tags = soup.find("section", {"class": "productList"}).findAll("a")
# Extracting URLs from the attribute href in the <a> tags.
for tag in tags:
print(tag.get('href'))
Here's the html code for that link
<section class="productList"> <article class="productListing"> <a class="product" href="//www.tennisexpress.com/adidas-mens-adizero-ubersonic-50-yrs-ltd-tennis-shoes-off-white-and-signal-blue-62138" title="Men`s Adizero Ubersonic 50 Yrs LTD Tennis Shoes Off White and Signal Blue" onmousedown="return nxt_repo.product_x('38698770','1');"> <span class="sale">SALE</span> <span class="image"> <img src="//www.tennisexpress.com/prodimages/78091-DEFAULT-m.jpg" alt="Men`s Adizero Ubersonic 50 Yrs LTD Tennis Shoes Off White and Signal Blue"> </span> <span class="brand"> Adidas </span> <span class="name"> Men`s Adizero Ubersonic 50 Yrs LTD Tennis Shoes Off White and Signal Blue </span> <span class="pricing"> <strong class="listPrice">$140.00</strong> <strong class="percentOff">0% OFF</strong> <strong class="salePrice">$139.95</strong> </span> <br> </a> </article> </section>
Upvotes: 2
Views: 3433
Reputation: 3118
By inspecting Network tab in Chrome DevTools you can notice that the products you search are fetched after making a request to https://tennisexpress-com.ecomm-nav.com/search.js
. You can see example response here. As you can see, it's a mess, so I wouldn't follow this approach.
In your code, you couldn't see the products because the request is made by JavaScript (running in your browser) after the initial page load. Neither standalone urllib
nor requests
can render that content. However you can do that with Requests-HTML
that has JavaScript support (it uses Chromium behind the scenes).
Code:
from itertools import chain
from requests_html import HTMLSession
session = HTMLSession()
url = 'https://www.tennisexpress.com/search.cfm?searchKeyword=adidas+boost'
r = session.get(url)
r.html.render()
links = list(chain(*[prod.absolute_links for prod in r.html.find('.product')]))
I used chain
to join all the sets with absolute links together and I created a list out of it.
>>> links
['https://www.tennisexpress.com/adidas-mens-barricade-2018-boost-tennis-shoes-black-and-night-metallic-62110',
'https://www.tennisexpress.com/adidas-mens-barricade-2018-boost-tennis-shoes-white-and-matte-silver-62109',
...
'https://www.tennisexpress.com/adidas-mens-supernova-glide-7-running-shoes-black-and-white-41636',
'https://www.tennisexpress.com/adidas-womens-adizero-boston-6-running-shoes-solar-yellow-and-midnight-gray-45268']
Don't forget to install Requests-HTML with pip install requests-html
.
Upvotes: 2
Reputation: 1190
Right here's the solution:
import requests
import bs4.BeautifulSoup as bs
url="https://www.tennisexpress.com/mens-adidas-tennis-shoes"
req = requests.get(url)
soup = bs(req.text,'lxml') # lxml because page is more xml than html
arts = soup.findAll("a",class_="product")
and that gives you a list of links to all the adidas tennis shoes! I'm sure you can manage from there.
Upvotes: 0
Reputation: 805
soup = BeautifulSoup(data, "html.parser")
markup = soup.find_all("section", class_=["productList"])
markupContent = markup.get_text()
So your code goes like
import urllib
from bs4 import BeautifulSoup
import requests
url = "https://www.tennisexpress.com/search.cfm?searchKeyword=BB6892"
r = urllib.urlopen(url).read()
soup = BeautifulSoup(r, "html.parser")
productMarkup = soup.find_all("section", class_=["productList"])
product = productMarkup.get_text()
Upvotes: 1