Ibtsam Ahmad
Ibtsam Ahmad

Reputation: 421

Python Requests Does Not get website that opens on browser

I have been trying to access this website https://www.dickssportinggoods.com/f/tents-accessories with requests module but it just keeps processing and does not stop while the same website works fine on browser. Scrappy gives a time out error for the same website. Is there something that should be taken into account while accessing websites like these. Thanks

Upvotes: 1

Views: 7192

Answers (3)

Ibtsam Ahmad
Ibtsam Ahmad

Reputation: 421

So Thanks to @Marcel and @Sonal but appart from headers, it just worked when i put the statement in a try/except block.

headers = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0\
                             Win64\
                             x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36'
    }
    session = requests.Session()

    try:
        r = session.get(
            link, headers=headers, stream=True)

        return r
    except requests.exceptions.ConnectionError:
        r.status_code = "Connection refused"

Upvotes: -2

Sonal Agrawal
Sonal Agrawal

Reputation: 175

For sites like these you can try to add the extra headers that your browser does. Following these steps worked for me -

  1. Open the link in incognito window with the network tab open.
  2. Copy the first request made by right clicking -> copy -> copy as curl
  3. Go to https://curl.trillworks.com/. Paste the curl command to get the equivalent python requests code.
  4. Now try removing headers one by one until it works with the minimal headers.

Image for reference - https://i.sstatic.net/vRS98.png

Edit -

import requests

headers = {
    'authority': 'www.dickssportinggoods.com',
    'pragma': 'no-cache',
    'cache-control': 'no-cache',
    'sec-ch-ua': '" Not;A Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"',
    'sec-ch-ua-mobile': '?0',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'sec-fetch-site': 'none',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-user': '?1',
    'sec-fetch-dest': 'document',
    'accept-language': 'en-US,en;q=0.9',
}

response = requests.get('https://www.dickssportinggoods.com/f/tents-accessories', headers=headers)

print(response.text)

Upvotes: 7

marcel h
marcel h

Reputation: 788

Have you tried adding headers?


import requests

headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get('https://www.dickssportinggoods.com/f/tents-accessories', headers=headers)
response.raise_for_status()

print(response.text)

Upvotes: 3

Related Questions