JulianP
JulianP

Reputation: 97

Why does BeautifulSoup return empty list on search results websites?

I'm looking to get the price of a specific article online and I cannot seem to get the element under a tag, but I could do it on another (different) site of the website. In this particular site, I only get an empty list. Printing soup.text also works. I don't want to use Selenium if possible, as I'm looking to understand how BS4 works for this kind of cases.

import requests
from bs4 import BeautifulSoup
url = 'https://reverb.com/p/electro-harmonix-oceans-11-reverb-2018'

r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
cards = soup.select(".product-row-card")
print (cards)
>>>[]

What I would like to get is the name and price of the cards in the website. I also had this problem before, but every solution here only suggests using Selenium (which I could make work) but I don't know why. I find it even less practical.

Also, is there a chance as I read that the website is using javascript to fetch this results. If that was the case, why could I fetch the data in https://reverb.com/price-guide/effects-and-pedals but not here? Would Selenium be the only solution in that case?

Upvotes: 1

Views: 442

Answers (1)

cody
cody

Reputation: 11157

You are correct that the site you're targeting relies on javascript to render the data you're trying to obtain. The issue is requests does not evaluate javascript.

You're also correct that Selenium WebDriver is often utilized in these situations, as it drives a real, full-blown browser instance. But it's not the only option, as requests-html has javascript support and is perhaps less cumbersome for simple scraping.

As an example to get you started, the following gets the title and price of the first five items on the site you're accessing:

from requests_html import HTMLSession
from bs4 import BeautifulSoup

session = HTMLSession()
r = session.get("https://reverb.com/p/electro-harmonix-oceans-11-reverb-2018")
r.html.render(sleep=5)

soup = BeautifulSoup(r.html.raw_html, "html.parser")
for item in soup.select(".product-row-card", limit=5):
    title = item.select_one(".product-row-card__title__text").text.strip()
    price = item.select_one(".product-row-card__price__base").text.strip()
    print(f"{title}: {price}")

Result:

Electro-Harmonix EHX Oceans 11 Eleven Reverb Hall Spring Guitar Effects Pedal: $119.98
Electro-Harmonix Oceans 11 Reverb - Used: $119.99
Electro-Harmonix Oceans 11 Multifunction Digital Reverb Effects Pedal: $122
Pre-Owned Electro-Harmonix Oceans 11 Reverb Multi Effects Pedal Used: $142.27
Electro-Harmonix Oceans 11 Reverb Matte Black: $110

Upvotes: 3

Related Questions