thepunitsingh
thepunitsingh

Reputation: 731

Cannot connect to proxy error on requests.get() or requests.post() in python

I have two URLs to fetch data from. Using my code, the first URL is working, whereas the second URL is giving ProxyError.

I am using requests library in Python 3 and tried searching the problem in Google and here, but with no success.

My code snippet is:

    import requests

    proxies = {
      'http': 'http://user:[email protected]:xxxx',
      'https': 'http://user:[email protected]:xxxx',
    }

    url1 = 'https://en.oxforddictionaries.com/definition/act'
    url2 = 'https://dictionary.cambridge.org/dictionary/english/act'

    r1 = requests.get(url1, proxies=proxies)
    r2 = requests.get(url2, proxies=proxies)

url1 works fine, but url2 gives following error:

    ProxyError: HTTPSConnectionPool(host='dictionary.cambridge.org', port=443): Max retries exceeded with url: /dictionary/english/act (Caused by ProxyError('Cannot connect to proxy.', RemoteDisconnected('Remote end closed connection without response',)))

Same happens on using request.post()

  1. Please explain me why this is happening, and is there any difference between the handshaking of both the URLs?

  2. urllib.request.urlopen is working fine, so I am explicity looking for answers using requests

Upvotes: 8

Views: 38419

Answers (2)

lilin
lilin

Reputation: 11

import re
import requests
import json
from bs4 import BeautifulSoup
import pymysql
import time, datetime
import os

from requests.adapters import HTTPAdapter


def get_random_proxy():
    proxypool_url = 'http://127.0.0.1:5555/random'
    """
    get random proxy from proxypool
    :return: proxy
    """
    return requests.get(proxypool_url).text.strip()


headers = {
    'User-Agent': 'Chrome',
    'Referer': 'https://www.nmpa.gov.cn/datasearch/home-index.html?79QlcAyHig6m=1636513393895',
    'Host': 'nmpa.gov.cn',
    'Origin': 'https://nmpa.gov.cn',
    'Content-Type': 'application/x-www-form-urlencoded',
    'Connection': 'close'
}
url = 'https://www.nmpa.gov.cn/datasearch/search-result.html'


def start_requests(coo):
    # r = json.loads(r.text)
    headers['Set-Cookie'] = coo
    s = requests.get(url=url, headers=headers, stream=True, timeout=(5, 5), verify=False)
    s.encoding = 'utf8'
    print(s.status_code)
    print(s)


while True:
    proxy = {'http': 'http://' + get_random_proxy(), 'https': 'https://' + get_random_proxy()}
    print(proxy)
    try:
        sess = requests.Session()
        sess.keep_alive = False  # 关闭多余连接
        res = sess.get(url='https://nmpa.gov.cn', headers={'User-Agent': 'Chrome'}, proxies=proxy, timeout=10,
                       verify=False)
        res.close()
        print(res.status_code)
        res.encoding = 'utf8'
        cookie = res.headers['Set-Cookie']
        print(cookie)
        if res.status_code == 200:
            print(res.status_code)
            time.sleep(10)
            start_requests(cookie)
            break
    except Exception as error:
        time.sleep(10)
        print("没有连接成功", error)

Upvotes: 0

Phoenix
Phoenix

Reputation: 341

I was able to illicit a valid response for url2 when using headers keyword argument with User-Agent string set to Chrome.

r2 = requests.get(url2, proxies=proxies, headers={'User-Agent': 'Chrome'})

To answer your first question, possible reason for this happening is related to server-side settings. It might be configured not to accept requests originating from unknown agents or requests with a missing User-Agent header.

Upvotes: 6

Related Questions