junev
junev

Reputation: 13

Beautiful Soup Python findAll returning empty list

I am trying to scrape an Amazon Alexa Skill: https://www.amazon.com/PayPal/dp/B075764QCX/ref=sr_1_1?dchild=1&keywords=paypal&qid=1604026451&s=digital-skills&sr=1-1

For now, I am just trying to get the name of the skill (Paypal), but for some reason this is returning an empty list. I have looked at the website's inspect element and I know that it should give me the name so I am not sure what is going wrong. My code is below:

request = Request(skill_url, headers=request_headers)
response = urlopen(request)
response = response.read()
html = response.decode()
soup = BeautifulSoup(html, 'html.parser')

name = soup.find_all("h1", {"class" : "a2s-title-content"})

Upvotes: 1

Views: 267

Answers (2)

Andrej Kesely
Andrej Kesely

Reputation: 195573

Try to set User-Agent and Accept-Language HTTP headers to prevent the server send you Captcha page:

import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:82.0) Gecko/20100101 Firefox/82.0',
    'Accept-Language': 'en-US,en;q=0.5'
}

url = 'https://www.amazon.com/PayPal/dp/B075764QCX/ref=sr_1_1?dchild=1&keywords=paypal&qid=1604026451&s=digital-skills&sr=1-1'

soup = BeautifulSoup(requests.get(url, headers=headers).content, 'lxml')
name = soup.find("h1", {"class" : "a2s-title-content"})
print(name.get_text(strip=True))

Prints:

PayPal

Upvotes: 0

Darkknight
Darkknight

Reputation: 1856

The page content is loaded with javascript, so you can't just use BeautifulSoup to scrape it. You have to use another module like selenium to simulate javascript execution.

Here is an example:

from bs4 import BeautifulSoup as soup
from selenium import webdriver

url='YOUR URL'

driver = webdriver.Firefox()
driver.get(url)

page = driver.page_source
page_soup = soup(page,'html.parser')

containers = page_soup.find_all("h1", {"class" : "a2s-title-content"})
print(containers)
print(len(containers))

You can also use chrome-driver or edge-driver see here

Upvotes: 1

Related Questions