I want to get the title of a product from amazon with bs4

Question

I Would like to get the title of this amazon product through BeuatifulSoup and requests. When I run this is says :

Traceback (most recent call last):
  File "scraper.py", line 15, in 
    title = soup.find('span', id='productTitle').get_text()
AttributeError: 'NoneType' object has no attribute 'get_text'

Plese help me

import bs4
import requests
from bs4 import BeautifulSoup
from urllib.request import urlopen

url = 'https://www.amazon.de/OnePlus-Smartphone-Almond-Display-Speicher/dp/B07RWL3K1Y/ref=sr_1_2? __mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=oneplus+7+pro&qid=1598088298&sr=8-2'

headers = {
    "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36'
}
page = requests.get(url, headers = headers)

soup = BeautifulSoup(page.content, 'html.parser')
title = soup.find('span', id='productTitle').get_text()
print(title)

kerasbaz · Accepted Answer

The issue is the use of the 'html.parser' as your bs4 parser. Try lxml instead (which will handle broken html more gracefully). The error was trying to tell you that it never found the -- we can see it's there, so it's probably a parsing failure related to non-standard HTML.

import requests
from bs4 import BeautifulSoup

url = 'https://www.amazon.de/OnePlus-Smartphone-Almond-Display-Speicher/dp/B07RWL3K1Y/ref=sr_1_2? __mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=oneplus+7+pro&qid=1598088298&sr=8-2'

headers = {
"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36'
}
page = requests.get(url, headers = headers)

soup = BeautifulSoup(page.content, 'lxml')
title = soup.find('span', id='productTitle').get_text().strip()
print(title)

Output:

OnePlus 7 Pro Smartphone Almond (16,9 cm) AMOLED Display 8 GB RAM + 256 GB Speicher, Triple Kamera (48 MP) Pop-up Kamera (16 MP) – Dual SIM Handy

I want to get the title of a product from amazon with bs4

Answers (1)

Related Questions