Reputation: 1
I am trying to do some webscraping (for the Automate the Boring Stuff with Python udemy course) but I keep getting the HTTPError: 403 Client Error: HTTP Forbidden for url:
error. Here is the code I have been working with:
import bs4
import requests
ro = requests.get('https://www.amazon.com/Automate-Boring-Stuff-Python-Programming/dp/1593275994/')
ro.raise_for_status()
And here's the error message I have been getting:
Traceback (most recent call last):
File "<pyshell#3>", line 1, in <module>
ro.raise_for_status()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: HTTP Forbidden for url: https://www.carsales.com.au/cars/details/2012-mazda-3-neo-bl-series-2-auto/SSE-AD-6368302/
I have read online about changing the user agent but I don't understand what that is or how to do that either. Can anyone offer some help here? I am completely lost and I can't seem to get any webscraping information anywhere. I am on Mac if that helps at all. Thanks.
Upvotes: 0
Views: 345
Reputation: 3051
The requests package allows you to change your user agent, this makes the server think you're a different browser.
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0'}
ro = requests.get('https://www.amazon.com/Automate-Boring-Stuff-Python-Programming/dp/1593275994/', headers=headers)
ro.raise_for_status()
soup = BeautifulSoup(ro.text, 'html.parser')
print(soup.prettify())
Upvotes: 1
Reputation: 196
First, I would suggest replacing ro.raise_for_status()
by ro.status_code
with if statements or a switch-case statment, however, if you want to use ro.raise_for_status()
you may want to use it inside try-catch block. Regarding to the error, Amazon seems to block the requests that has default requests
module user-agent, to overcome this, you may want to change the user-agent to something like: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36
, for further information about implementing this, please check this page, Using Python Requests section.
P.S: please make sure to check if web scraping Amazon is legal.
Upvotes: 0