dmort
dmort

Reputation: 301

Python urllib request always results in Error 400: Bad Request

Thanks for reading. For a small reserach project, I'm trying to gather some data from KBB (www.kbb.com). However, I'm always getting a "urllib.error.HTTPError: HTTP Error 400: Bad Request" Error. I think I can access different websites with this simple piece of code. I'm not sure if this is an issue with the code or the specific website itself?

Maybe someone can point me in the right direction.

from urllib import request as urlrequest
proxy_host = '23.107.176.36:32180'
url = "https://www.kbb.com/gmc/canyon-extended-cab/2018/"

req = urlrequest.Request(url)
req.set_proxy(proxy_host, 'https')

page = urlrequest.urlopen(req)
print(page)

Upvotes: 1

Views: 2221

Answers (2)

Federico Baù
Federico Baù

Reputation: 7656

There are 2 issue but one solution as I found below

  1. Is the proxy server which is refused.
  2. You need authentication for the server in every case it responds with a 403 forbidden

Using urlib

from urllib import request as urlrequest
proxy_host = '23.107.176.36:32180'
url = "https://www.kbb.com/gmc/canyon-extended-cab/2018/"

req = urlrequest.Request(url)
# req.set_proxy(proxy_host, 'https')

page = urlrequest.urlopen(req)
print(req)

> urllib.error.HTTPError: HTTP Error 403: Forbidden

Using Requests

import requests

url = "https://www.kbb.com/gmc/canyon-extended-cab/2018/"

res = requests.get(url)
print(res)
# >>> <Response [403]>

Using PostMan

enter image description here

edit Solution

Setting a timeout litter longer it works. however I had to retry several times, because the proxy sometimes just dont' reponds

import urllib.request


proxy_host = '23.107.176.36:32180'
url = "https://www.kbb.com/gmc/canyon-extended-cab/2018/"

proxy_support = urllib.request.ProxyHandler({'https' : proxy_host})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)

res = urllib.request.urlopen(url, timeout=1000) # Set
print(res.read())

Result

b'<!doctype html><html lang="en"><head><meta http-equiv="X-UA-Compatible" content="IE=edge"><meta charset="utf-8"><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=5,minimum-scale=1"><meta http-equiv="x-dns-prefetch-control" content="on"><link rel="dns-prefetch preconnect" href="//securepubads.g.doubleclick.net" crossorigin><link rel="dns-prefetch preconnect" href="//c.amazon-adsystem.com" crossorigin><link .........

Using Requests

import requests
proxy_host = '23.107.176.36:32180'
url = "https://www.kbb.com/gmc/canyon-extended-cab/2018/"

# NOTE: we need a loger timeout for the proxy t response and set verify sale for an ssl error
r = requests.get(url, proxies={"https": proxy_host}, timeout=90000,  verify=False) # Timeout are in milliseconds
print(r.text)

Upvotes: 1

Rusticus
Rusticus

Reputation: 382

Your code appears to work fine without the set_proxy statement, I think it is most likely that your proxy server is rejecting the request rather than KBB.

Upvotes: 0

Related Questions