Reputation: 779
The url I'm using is: https://www.amazon.com/Marmot-womens-Precip-Lightweight-Waterproof/product-reviews/B086YML34N/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews
product_name = soup.findall("h1",{"class":"a-size-large a-text-ellipsis"})
I'm trying to get the product title for all the reviews on the page. I've tried to use the code above, but I get "nonetype" error. The rest of my attributes in my code do work. I'm thinking that I'm not picking the right class. Has anyone had experience scraping from amazon that would be able to provide some advice on what I am doing wrong?
Upvotes: 1
Views: 132
Reputation: 8215
It is b'coz there is no findall
in bs4. It is either find_all
or findAll
(capital A).
Upvotes: 1
Reputation: 195438
You probably got captcha page. Try to specify different HTTP headers (in my case User-Agent
and Accept-Language
):
import requests
from bs4 import BeautifulSoup
url = 'https://www.amazon.com/Marmot-womens-Precip-Lightweight-Waterproof/product-reviews/B086YML34N/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0',
'Accept-Language': 'en-US,en;q=0.5'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
print(soup.select_one('h1.a-size-large.a-text-ellipsis').text)
Prints:
Marmot womens Precip Lightweight Waterproof Rain Jacket
Upvotes: 1