Haider
Haider

Reputation: 71

How to speed up requests Python

So I have this code that scrapes javascript content:

from requests_html import HTMLSession

#create the session
session = HTMLSession()

#define our URL
url = 'https://partalert.net/product.js?asin=B08L8LG4M3&price=%E2%82%AC702.07&smid=A3JWKAKR8XB7XF&tag=partalertde-21&timestamp=16%3A33+UTC+%2821.4.2021%29&title=ASUS+DUAL+NVIDIA+GeForce+RTX+3070+OC+Edition+Gaming+Grafikkarte+%28PCIe+4.0%2C+8+GB+GDDR6+Speicher%2C+HDMI+2.1%2C+DisplayPort+1.4a%2C+Axial-tech+L%C3%BCfterdesign%2C+Dual+BIOS%2C+Schutzr%C3%BCckwand%2C+GPU+Tweak+II%29&tld=.de'

#use the session to get the data
r = session.get(url)

#Render the page, up the number on scrolldown to page down multiple times on a page
r.html.render(sleep=0, keep_page=True, scrolldown=0)

#take the rendered html and find the element that we are interested in
links = r.html.find('#href')

#loop through those elements extracting the text and link
for item in links:
    link = {
        'link': item.absolute_links
    }
print(link)

However it takes 2-3 seconds which is way to long to load for me. Is there a way to speed it up?

Upvotes: 0

Views: 582

Answers (1)

RJ Adriaansen
RJ Adriaansen

Reputation: 9639

There is no need to scrape the site at all. When you look at the source code you can see that javascript is generating the Amazon url from the input url:

document.getElementById(
          "href"
        ).href = `https://www.amazon${tld}/dp/${asin}?tag=${tag}&linkCode=ogi&th=1&psc=1&smid=${smid}`;

This means that you only have to replicate this function in python to generate your urls. You can get the values of the url parameters with urllib.parse, then use string formatting to generate the new url:

from urllib.parse import urlsplit, parse_qs

url = 'https://partalert.net/product.js?asin=B08L8LG4M3&price=%E2%82%AC702.07&smid=A3JWKAKR8XB7XF&tag=partalertde-21&timestamp=16%3A33+UTC+%2821.4.2021%29&title=ASUS+DUAL+NVIDIA+GeForce+RTX+3070+OC+Edition+Gaming+Grafikkarte+%28PCIe+4.0%2C+8+GB+GDDR6+Speicher%2C+HDMI+2.1%2C+DisplayPort+1.4a%2C+Axial-tech+L%C3%BCfterdesign%2C+Dual+BIOS%2C+Schutzr%C3%BCckwand%2C+GPU+Tweak+II%29&tld=.de'
query = urlsplit(url).query
params = parse_qs(query)
amazon_url = f"https://www.amazon{params['tld'][0]}/dp/{params['asin'][0]}?tag={params['tag'][0]}&linkCode=ogi&th=1&psc=1&smid={params['smid'][0]}"

Result:

https://www.amazon.de/dp/B08L8LG4M3?tag=partalertde-21&linkCode=ogi&th=1&psc=1&smid=A3JWKAKR8XB7XF

Upvotes: 2

Related Questions