Reputation: 301
Thanks for reading. For a small reserach project, I'm trying to gather some data from KBB (www.kbb.com). However, I'm always getting a "urllib.error.HTTPError: HTTP Error 400: Bad Request" Error. I think I can access different websites with this simple piece of code. I'm not sure if this is an issue with the code or the specific website itself?
Maybe someone can point me in the right direction.
from urllib import request as urlrequest
proxy_host = '23.107.176.36:32180'
url = "https://www.kbb.com/gmc/canyon-extended-cab/2018/"
req = urlrequest.Request(url)
req.set_proxy(proxy_host, 'https')
page = urlrequest.urlopen(req)
print(page)
Upvotes: 1
Views: 2221
Reputation: 7656
There are 2 issue but one solution as I found below
Using urlib
from urllib import request as urlrequest
proxy_host = '23.107.176.36:32180'
url = "https://www.kbb.com/gmc/canyon-extended-cab/2018/"
req = urlrequest.Request(url)
# req.set_proxy(proxy_host, 'https')
page = urlrequest.urlopen(req)
print(req)
> urllib.error.HTTPError: HTTP Error 403: Forbidden
Using Requests
import requests
url = "https://www.kbb.com/gmc/canyon-extended-cab/2018/"
res = requests.get(url)
print(res)
# >>> <Response [403]>
Using PostMan
Setting a timeout litter longer it works. however I had to retry several times, because the proxy sometimes just dont' reponds
import urllib.request
proxy_host = '23.107.176.36:32180'
url = "https://www.kbb.com/gmc/canyon-extended-cab/2018/"
proxy_support = urllib.request.ProxyHandler({'https' : proxy_host})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
res = urllib.request.urlopen(url, timeout=1000) # Set
print(res.read())
Result
b'<!doctype html><html lang="en"><head><meta http-equiv="X-UA-Compatible" content="IE=edge"><meta charset="utf-8"><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=5,minimum-scale=1"><meta http-equiv="x-dns-prefetch-control" content="on"><link rel="dns-prefetch preconnect" href="//securepubads.g.doubleclick.net" crossorigin><link rel="dns-prefetch preconnect" href="//c.amazon-adsystem.com" crossorigin><link .........
import requests
proxy_host = '23.107.176.36:32180'
url = "https://www.kbb.com/gmc/canyon-extended-cab/2018/"
# NOTE: we need a loger timeout for the proxy t response and set verify sale for an ssl error
r = requests.get(url, proxies={"https": proxy_host}, timeout=90000, verify=False) # Timeout are in milliseconds
print(r.text)
Upvotes: 1
Reputation: 382
Your code appears to work fine without the set_proxy statement, I think it is most likely that your proxy server is rejecting the request rather than KBB.
Upvotes: 0