Reputation: 21
I tried to scrape a page using beautiful soup (bs4) , but i am facing a problem while scraping data, I had even mentioned headers as pointed out in this answer Stackoverflow Question This is my code
from bs4 import BeautifulSoup
import requests
headers = {
'Referer': 'hello',
}
r=requests.get
('https://www.doamin.com/bangalore/restaurants',headers=headers)
print(r.status_code)
this is the error that i am getting
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))
and this
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without
response
I even tried using the useragents
import requests
url = 'https://www.example.com/bangalore/restaurants'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}
response = requests.get(url, headers=headers)
print(response.content)
But still getting the same error!
Can anyone help me out ?
Upvotes: 2
Views: 1437
Reputation: 45473
I guess the server is checking more thoroughly the user agent string by checking a list of valid Chrome version (if you specify a Chrome browser in user agent). The version you specified (41.0.2228) is not listed among Chrome version history. Use for instance 41.0.2272 :
import requests
url = 'https://www.example.com/bangalore/restaurants'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/41.0.2272.0 Safari/537.36'}
response = requests.get(url, headers=headers)
print(response.content)
Upvotes: 1
Reputation: 73
It is most likely that Zomato (and many other data collecting websites) have implemented measures to block data scrapers or data miners. Just use their API instead: https://developers.zomato.com/api
Upvotes: 0