Reputation: 51

I am not able to scrape the web data from the given website using python

Hi I ans trying to scrape the data from the site https://health.usnews.com/doctors/city-index/new-jersey . I want all the city name and again from the link scrape the data. But using requests library in python something is going wrong. There are some session or cookies or something which is stopping to crawl the data. please help me out.

>>> import requests
>>> url = 'https://health.usnews.com/doctors/city-index/new-jersey'
>>> html_content = requests.get(url)
>>> html_content.status_code
403
>>> html_content.content
'<HTML><HEAD>\n<TITLE>Access Denied</TITLE>\n</HEAD><BODY>\n<H1>Access Denied</H1>\n \nYou don\'t have permission to access "http&#58;&#47;&#47;health&#46;usnews&#46;com&#47;doctors&#47;city&#45;index&#47;new&#45;jersey" on this server.<P>\nReference&#32;&#35;18&#46;7d70b17&#46;1528874823&#46;3fac5589\n</BODY>\n</HTML>\n'
>>>

Here is the error I am getting.

Upvotes: 1

Answers (2)

Nazim Kerimbekov

Reputation: 4783

First of all, Like the previous answer suggested I would recommend you to add a header to your code, so your code should look something like this:

import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:60.0) Gecko/20100101 Firefox/60.0'}
url = 'https://health.usnews.com/doctors/city-index/new-jersey'
html_content = requests.get(url, headers=headers)
html_content.status_code
print(html_content.text)

Upvotes: 1

peeyush113

Reputation: 120

You need to add header in your request so that the site think you are a genuine user.

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
html_content = requests.get(url, headers=headers)

Upvotes: 1

I am not able to scrape the web data from the given website using python

Answers (2)

Related Questions