lilpig
lilpig

Reputation: 145

unable to requests.get() a website, 'Remote end closed connection without response'

when I try to send a request to this website:

import requests
requests.get('https://www.ldoceonline.com/')

An exception returned

requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

The weird part is, if you access the website through normal approach(via a browser), they are fully functional and respond very well. Only when you try to retrieve information via web-scraping technique do you encounter this response.

Any idea on how to successfully scraping it?

Upvotes: 10

Views: 21664

Answers (2)

SIM
SIM

Reputation: 22440

Try using a header to get the desired response.

import requests

res = requests.get('https://www.ldoceonline.com/',headers={"User-Agent":"Mozilla/5.0"})
print(res.status_code)

Output:

200

Upvotes: 21

ash17
ash17

Reputation: 145

If you inspect requests module's code, you will find values of the default headers used while making a request. The above-mentioned User-Agent header is there too.

Seems like a bunch of webresources (whether intentionally or unintentionally) do not process requests properly if the User-Agent header is set to "python-requests/2.21.0".

So the practical solution is to use custom User-Agent header. User-Agent strings for different browsers are provided here.

import requests

url = 'https://www.ldoceonline.com/'
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36"}

r = requests.get(url,headers=headers)
r.raise_for_status()

Upvotes: 12

Related Questions