hitterbullzeye
hitterbullzeye

Reputation: 3

Python requests not returning fully loaded content

Trying to gain the sizes from here.

The content I want:

enter image description here

However I am receiving:

[<div class="options" id="productSizeStock">
<button class="btn options-loading" disabled="" type="button">
</button>
<button class="btn options-loading" disabled="" type="button">
</button>
<button class="btn options-loading" disabled="" type="button">
</button>
<button class="btn options-loading" disabled="" type="button">
</button>
<button class="btn options-loading" disabled="" type="button">
</button>
<button class="btn options-loading" disabled="" type="button">
</button>
<button class="btn options-loading" disabled="" type="button">
</button>
<button class="btn options-loading" disabled="" type="button">
</button>
<button class="btn options-loading" disabled="" type="button">
</button>
<button class="btn options-loading" disabled="" type="button">
</button>
<button class="btn options-loading" disabled="" type="button">
</button>
<button class="btn options-loading" disabled="" type="button">
</button>

I also tried using requests-html to see if it was a javascript rendering issue. But I was just receiving empty values.

Here is my code:

import requests
import randomheaders
from bs4 import BeautifulSoup

proxy = {'''PROXY'''}
while True:
    try:
        source = requests.get("https://www.size.co.uk/product/grey-nike-air-max-98-se/132114/", proxies= proxy, headers=randomheaders.LoadHeader(),timeout=30).text
        soup = BeautifulSoup(source, features = "lxml")
        print(soup.find_all("div", class_="options"))

    except Exception as e:
        print(e)

    time.sleep(5)

Upvotes: 0

Views: 6102

Answers (2)

Nazim Kerimbekov
Nazim Kerimbekov

Reputation: 4783

from a technical point of view your code is correct. As this website uses Javascript to render itself, the size is store on a different URL, which is the following:

https://www.size.co.uk/product/grey-nike-air-max-98-se/132114/stock

as you can see you just have to add /stock to your initial URL.


That being said, try replacing this:

source = requests.get("https://www.size.co.uk/product/grey-nike-air-max-98-se/132114/", proxies= proxy, headers=randomheaders.LoadHeader(),timeout=30).text
soup = BeautifulSoup(source, features = "lxml")
print(soup.find_all("div", class_="options"))

with:

source = requests.get("https://www.size.co.uk/product/grey-nike-air-max-98-se/132114/stock", proxies= proxy, headers=randomheaders.LoadHeader(),timeout=30).text
soup = BeautifulSoup(source, features = "lxml")
sizes = [x["title"].replace("Select Your UK Size ","") for x in soup.find_all("button",{"data-e2e":"product-size"})]
print(sizes)

Where sizes is a list containing all of the sizes and has the following output:

['6', '7', '7.5', '8', '8.5', '9', '9.5', '10', '10.5', '11', '11.5', '12']

Hope this helps!

Upvotes: 1

Alex Gidan
Alex Gidan

Reputation: 2679

It is probably because the information you are searching for is added dynamically by a client-side script (JS in this case). I don't see an easy way to get the information simply with requests if it's the case, probably you should analyse better the page scripting and if really motivated perform the proper AJAX requests.

So, to recap, you are not getting the correct results because any JS generated content has to be rendered on the document. When you fetch the HTML page, you fetch only the initial document.

A possible solution (the solution is for Python's 3.6 only) consists at using requests-HTML instead of requests:

This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible.

  1. Install requests-html: pipenv install requests-html

  2. Make a request to the page's url:

    from requests_html import HTMLSession
    
    session = HTMLSession()
    r = session.get(a_page_url)
    
  3. Render the response to get the Javascript generated bits:

    r.html.render()
    

This module offers scraping and JavaScript Support, that is exactly what you need.

Upvotes: 3

Related Questions