Reputation: 145
when I try to send a request to this website:
import requests
requests.get('https://www.ldoceonline.com/')
An exception returned
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))
The weird part is, if you access the website through normal approach(via a browser), they are fully functional and respond very well. Only when you try to retrieve information via web-scraping technique do you encounter this response.
Any idea on how to successfully scraping it?
Upvotes: 10
Views: 21664
Reputation: 22440
Try using a header to get the desired response.
import requests
res = requests.get('https://www.ldoceonline.com/',headers={"User-Agent":"Mozilla/5.0"})
print(res.status_code)
Output:
200
Upvotes: 21
Reputation: 145
If you inspect requests module's code, you will find values of the default headers used while making a request. The above-mentioned User-Agent header is there too.
Seems like a bunch of webresources (whether intentionally or unintentionally) do not process requests properly if the User-Agent header is set to "python-requests/2.21.0".
So the practical solution is to use custom User-Agent header. User-Agent strings for different browsers are provided here.
import requests
url = 'https://www.ldoceonline.com/'
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36"}
r = requests.get(url,headers=headers)
r.raise_for_status()
Upvotes: 12