LifeLearner
LifeLearner

Reputation: 39

AttributeError: 'NoneType' object has no attribute 'get_text' python web-scraping

I'm following this tutorial and I got this error even though I did everything correctly. This's the tutorial link https://www.youtube.com/watch?v=Bg9r_yLk7VY&t=241s and this's my code below

import requests
from bs4 import BeautifulSoup

URL = 'https://www.amazon.com/-/de/dp/B07RF1XD36/ref=lp_16225007011_1_6?s=computers-intl-ship&ie=UTF8&qid=1581249551&sr=1-6'

headers ={"User-Agent": 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36'}

page = requests.get(URL, headers=headers)

soup = BeautifulSoup(page.content, 'html.parser')

title = soup.find(id="productTitle").get_text()

print(title.strip())

This's the error message I got when I run the code

Traceback (most recent call last):
  File "scraper.py", line 26, in <module>
    title = soup.find(id="productTitle").get_text()
AttributeError: 'NoneType' object has no attribute 'get_text'

Upvotes: 0

Views: 1489

Answers (1)

SIM
SIM

Reputation: 22440

To get the title of the product from that page all you need to do is change the parser from html.parser to html5lib or lxml. The latter two have the capability to fix some botched up html elements which in this case do not let you parse the title. I've also implemented random user agent within the script to make it robust.

Working code:

import requests
from bs4 import BeautifulSoup
from fake_useragent import UserAgent

ua = UserAgent()

URL = 'https://www.amazon.com/-/de/dp/B07RF1XD36/ref=lp_16225007011_1_6?s=computers-intl-ship&ie=UTF8&qid=1581249551&sr=1-6'

page = requests.get(URL, headers={'User-Agent':ua.random})
soup = BeautifulSoup(page.text, 'html5lib')
title = soup.find(id="productTitle").get_text(strip=True)
print(title)

Upvotes: 3

Related Questions