Reputation: 39
I'm following this tutorial and I got this error even though I did everything correctly. This's the tutorial link https://www.youtube.com/watch?v=Bg9r_yLk7VY&t=241s and this's my code below
import requests
from bs4 import BeautifulSoup
URL = 'https://www.amazon.com/-/de/dp/B07RF1XD36/ref=lp_16225007011_1_6?s=computers-intl-ship&ie=UTF8&qid=1581249551&sr=1-6'
headers ={"User-Agent": 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36'}
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
title = soup.find(id="productTitle").get_text()
print(title.strip())
This's the error message I got when I run the code
Traceback (most recent call last):
File "scraper.py", line 26, in <module>
title = soup.find(id="productTitle").get_text()
AttributeError: 'NoneType' object has no attribute 'get_text'
Upvotes: 0
Views: 1489
Reputation: 22440
To get the title of the product from that page all you need to do is change the parser from html.parser
to html5lib
or lxml
. The latter two have the capability to fix some botched up html elements which in this case do not let you parse the title. I've also implemented random user agent within the script to make it robust.
Working code:
import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent
ua = UserAgent()
URL = 'https://www.amazon.com/-/de/dp/B07RF1XD36/ref=lp_16225007011_1_6?s=computers-intl-ship&ie=UTF8&qid=1581249551&sr=1-6'
page = requests.get(URL, headers={'User-Agent':ua.random})
soup = BeautifulSoup(page.text, 'html5lib')
title = soup.find(id="productTitle").get_text(strip=True)
print(title)
Upvotes: 3