Beautiful Soup Python findAll returning empty list

Question

I am trying to scrape an Amazon Alexa Skill: https://www.amazon.com/PayPal/dp/B075764QCX/ref=sr_1_1?dchild=1&keywords=paypal&qid=1604026451&s=digital-skills&sr=1-1

For now, I am just trying to get the name of the skill (Paypal), but for some reason this is returning an empty list. I have looked at the website's inspect element and I know that it should give me the name so I am not sure what is going wrong. My code is below:

request = Request(skill_url, headers=request_headers)
response = urlopen(request)
response = response.read()
html = response.decode()
soup = BeautifulSoup(html, 'html.parser')

name = soup.find_all("h1", {"class" : "a2s-title-content"})

Andrej Kesely · Accepted Answer

Try to set User-Agent and Accept-Language HTTP headers to prevent the server send you Captcha page:

import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:82.0) Gecko/20100101 Firefox/82.0',
    'Accept-Language': 'en-US,en;q=0.5'
}

url = 'https://www.amazon.com/PayPal/dp/B075764QCX/ref=sr_1_1?dchild=1&keywords=paypal&qid=1604026451&s=digital-skills&sr=1-1'

soup = BeautifulSoup(requests.get(url, headers=headers).content, 'lxml')
name = soup.find("h1", {"class" : "a2s-title-content"})
print(name.get_text(strip=True))

Prints:

PayPal

Beautiful Soup Python findAll returning empty list

Answers (2)

Related Questions