Reputation: 51
Hi I ans trying to scrape the data from the site https://health.usnews.com/doctors/city-index/new-jersey . I want all the city name and again from the link scrape the data. But using requests library in python something is going wrong. There are some session or cookies or something which is stopping to crawl the data. please help me out.
>>> import requests
>>> url = 'https://health.usnews.com/doctors/city-index/new-jersey'
>>> html_content = requests.get(url)
>>> html_content.status_code
403
>>> html_content.content
'<HTML><HEAD>\n<TITLE>Access Denied</TITLE>\n</HEAD><BODY>\n<H1>Access Denied</H1>\n \nYou don\'t have permission to access "http://health.usnews.com/doctors/city-index/new-jersey" on this server.<P>\nReference #18.7d70b17.1528874823.3fac5589\n</BODY>\n</HTML>\n'
>>>
Here is the error I am getting.
Upvotes: 1
Views: 2634
Reputation: 4783
First of all, Like the previous answer suggested I would recommend you to add a header to your code, so your code should look something like this:
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:60.0) Gecko/20100101 Firefox/60.0'}
url = 'https://health.usnews.com/doctors/city-index/new-jersey'
html_content = requests.get(url, headers=headers)
html_content.status_code
print(html_content.text)
Upvotes: 1
Reputation: 120
You need to add header in your request so that the site think you are a genuine user.
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
html_content = requests.get(url, headers=headers)
Upvotes: 1