Trying to get product title from Amazon Review Url

Question

The url I'm using is: https://www.amazon.com/Marmot-womens-Precip-Lightweight-Waterproof/product-reviews/B086YML34N/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews

product_name = soup.findall("h1",{"class":"a-size-large a-text-ellipsis"})

I'm trying to get the product title for all the reviews on the page. I've tried to use the code above, but I get "nonetype" error. The rest of my attributes in my code do work. I'm thinking that I'm not picking the right class. Has anyone had experience scraping from amazon that would be able to provide some advice on what I am doing wrong?

Andrej Kesely · Accepted Answer

You probably got captcha page. Try to specify different HTTP headers (in my case User-Agent and Accept-Language):

import requests 
from bs4 import BeautifulSoup


url = 'https://www.amazon.com/Marmot-womens-Precip-Lightweight-Waterproof/product-reviews/B086YML34N/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0',
'Accept-Language': 'en-US,en;q=0.5'}

soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
print(soup.select_one('h1.a-size-large.a-text-ellipsis').text)

Prints:

Marmot womens Precip Lightweight Waterproof Rain Jacket

Trying to get product title from Amazon Review Url

Answers (2)

Related Questions