QMan5
QMan5

Reputation: 779

Trying to get product title from Amazon Review Url

The url I'm using is: https://www.amazon.com/Marmot-womens-Precip-Lightweight-Waterproof/product-reviews/B086YML34N/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews

product_name = soup.findall("h1",{"class":"a-size-large a-text-ellipsis"})

I'm trying to get the product title for all the reviews on the page. I've tried to use the code above, but I get "nonetype" error. The rest of my attributes in my code do work. I'm thinking that I'm not picking the right class. Has anyone had experience scraping from amazon that would be able to provide some advice on what I am doing wrong?

Upvotes: 1

Views: 132

Answers (2)

Bitto
Bitto

Reputation: 8215

It is b'coz there is no findall in bs4. It is either find_all or findAll (capital A).

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195438

You probably got captcha page. Try to specify different HTTP headers (in my case User-Agent and Accept-Language):

import requests 
from bs4 import BeautifulSoup


url = 'https://www.amazon.com/Marmot-womens-Precip-Lightweight-Waterproof/product-reviews/B086YML34N/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0',
'Accept-Language': 'en-US,en;q=0.5'}

soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
print(soup.select_one('h1.a-size-large.a-text-ellipsis').text)

Prints:

Marmot womens Precip Lightweight Waterproof Rain Jacket

Upvotes: 1

Related Questions