Omar Kamel Mostafa
Omar Kamel Mostafa

Reputation: 63

Unable to extract item title in Amazon

When I'm trying to know the Title of Sony Headset using the below code, the result of code is None.

import requests    
from bs4 import BeautifulSoup

URL = 'https://www.amazon.com/Sony-Noise-Cancelling-Headphones- 
       WH1000XM3/dp/B07G4MNFS1/ref=sxin_0_ac_d_rm?ac_md=0-0-c29ueQ%3D%3D- 
       ac_d_rm&keywords=sony&pd_rd_i=B07G4MNFS1&pd_rd_r=3e6d5325-8ee4-4ba8-a84f- 
       1b7cf2ce98bf&pd_rd_w=BVSFq&pd_rd_wg=I0LMZ&pf_rd_p=e2f20af2-9651-42af-9a45- 
       89425d5bae34&pf_rd_r=VGT25BXXZNDE3B61A994&psc=1&qid=1577253649&smid=ATVPDKIKX0DER'

headers = {"User-Agent":"Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like 
Gecko) Chrome/79.0.3945.88 Safari/537.36"}

page = requests.get(URL, headers=headers)    
soup = BeautifulSoup(page.content, "html.parser")
soup.prettify()

#print(soup)

title = soup.find_all('span', {'id':'productTitle'})                        

print(title, len(title))   

Current Output is :

[ ] 0

Upvotes: 0

Views: 403

Answers (2)

Matthew Gaiser
Matthew Gaiser

Reputation: 4803

I spent the last two hours trying to scrape that title with BeautifulSoup. I tried scraping other elements on the page. No success. I tried sending the raw content to file and that broke due to the presence of strange characters.

I tried Ahmed's answer and still got none. I tried a bunch of other solutions I found online and still got none. I can't for the life of me figure out how to use BeautifulSoup to scrape this.

I know you use Selenium, so here is the Selenium solution.

from selenium import webdriver
bot = webdriver.Chrome()
bot.get("https://www.amazon.com/Sony-Noise-Cancelling-Headphones-WH1000XM3/dp/B07G4MNFS1/ref=sxin_0_ac_d_rm?ac_md=0-0-c29ueQ==-ac_d_rm&keywords=sony&pd_rd_i=B07G4MNFS1&pd_rd_r=3e6d5325-8ee4-4ba8-a84f-1b7cf2ce98bf&pd_rd_w=BVSFq&pd_rd_wg=I0LMZ&pf_rd_p=e2f20af2-9651-42af-9a45-89425d5bae34&pf_rd_r=VGT25BXXZNDE3B61A994&psc=1&qid=1577253649&smid=ATVPDKIKX0DER")
title = bot.find_element_by_id('productTitle').text
print(title)
bot.close()

Upvotes: 1

import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.amazon.com/Sony-Noise-Cancelling-Headphones-WH1000XM3/dp/B07G4MNFS1/ref=sxin_0_ac_d_rm?ac_md=0-0-c29ueQ==-ac_d_rm&keywords=sony&pd_rd_i=B07G4MNFS1&pd_rd_r=3e6d5325-8ee4-4ba8-a84f-1b7cf2ce98bf&pd_rd_w=BVSFq&pd_rd_wg=I0LMZ&pf_rd_p=e2f20af2-9651-42af-9a45-89425d5bae34&pf_rd_r=VGT25BXXZNDE3B61A994&psc=1&qid=1577253649&smid=ATVPDKIKX0DER")
soup = BeautifulSoup(r.text, 'html.parser')

for item in soup.findAll("span", {'id': 'productTitle'}):
    print(item.get_text(strip=True))

Output:

Sony Noise Cancelling Headphones WH1000XM3: Wireless Bluetooth Over the Ear Headphones with Mic and Alexa voice control - Industry Leading Active Noise Cancellation - Black

Run Code Online: Click Here

Upvotes: 1

Related Questions