Reputation: 11
I am trying to scrape https://www.vitals.com/locations/primary-care-doctors/ny. I have been able to scrape other sites by editing my headers, but I keep getting a 403 error with this one.
from bs4 import BeautifulSoup
import requests
with requests.Session() as se:
se.headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36",
"Accept-Encoding": "gzip, deflate, br",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
"Accept-Language": "en-US,en;q=0.9",
}
test_sites = [
'http://fashiontoast.com/',
'https://www.vitals.com/locations/primary-care-doctors/ny',
'http://www.seaofshoes.com/',
]
for site in test_sites:
print(site)
#get page soure
response = se.get(site)
print(response)
#print(response.text)
Upvotes: 1
Views: 267
Reputation: 1047
Try adding the code to the with
statement as follows
from bs4 import BeautifulSoup
import requests
with requests.Session() as se:
se.headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36",
"Accept-Encoding": "gzip, deflate, br",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
"Accept-Language": "en-US,en;q=0.9",
}
test_sites = [
'http://fashiontoast.com/',
'https://www.vitals.com/locations/primary-care-doctors/ny',
'http://www.seaofshoes.com/',
]
for site in test_sites:
print(site)
#get page soure
response = se.get(site)
print(response)
#print(response.text)
Upvotes: 1