swingcake
swingcake

Reputation: 121

BeautifulSoup find_all() returns nothing []

I'm trying to scrape this page of all the offers, and want to iterate over <p class="white-strip"> but page_soup.find_all("p", "white-strip") returns an empty list [].

My code so far-

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://www.sbicard.com/en/personal/offers.page#all-offers'

# Opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

# html parsing
page_soup = soup(page_html, "lxml")

Edit: I got it working using Selenium and below is the code I used. However, I am not able to figure out the other method through which the same can be done.

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome("C:\chromedriver_win32\chromedriver.exe")
driver.get('https://www.sbicard.com/en/personal/offers.page#all-offers')

# html parsing
page_soup = BeautifulSoup(driver.page_source, 'lxml')

# grabs each offer
containers = page_soup.find_all("p", {'class':"white-strip"})

filename = "offers.csv"
f = open(filename, "w")

header = "offer-list\n"

f.write(header)

for container in containers:
    offer = container.span.text
    f.write(offer + "\n")

f.close()
driver.close()

Upvotes: 1

Views: 2085

Answers (2)

bharatk
bharatk

Reputation: 4315

website is dynamic rendering request data. You should try automation selenium library. it allows you to scrape dynamic rendering request(js or ajax) page data.

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome("/usr/bin/chromedriver")
driver.get('https://www.sbicard.com/en/personal/offers.page#all-offers')

page_soup = BeautifulSoup(driver.page_source, 'lxml')
p_list = page_soup.find_all("p", {'class':"white-strip"})

print(p_list)

where '/usr/bin/chromedriver' selenium web driver path.

Download selenium web driver for chrome browser:

http://chromedriver.chromium.org/downloads

Install web driver for chrome browser:

https://christopher.su/2015/selenium-chromedriver-ubuntu/

Selenium tutorial:

https://selenium-python.readthedocs.io/

Upvotes: 1

SIM
SIM

Reputation: 22440

If you look for either of the items, you can find them within a script tag containing var offerData. To get the desired content out of that script, you can try the following.

import re
import json
import requests

url = "https://www.sbicard.com/en/personal/offers.page#all-offers"

res = requests.get(url)
p = re.compile(r"var offerData=(.*?);",re.DOTALL)
script = p.findall(res.text)[0].strip()
items = json.loads(script)
for item in items['offers']['offer']:
    print(item['text'])

Output are like:

Upto Rs 8000 off on flights at Yatra
Electricity Bill payment – Phonepe Offer
25% off on online food ordering
Get 5% cashback at Best Price stores
Get 5% cashback

Upvotes: 1

Related Questions