Moran Reznik
Moran Reznik

Reputation: 93

.findAll() finds nothing from webpage

In order to pull out reviews from google store, I am trying to learn the library beautiful soup. I wrote a code that should get me all reviews (including the star rating, date and name of reviewer) but the output is just an empty list. The problem is probably something very basic that I am just too inexperienced to know of.

from urllib.request import urlopen
from bs4 import BeautifulSoup as soup
my_url = 'https://play.google.com/store/apps/details?id=com.playstudios.popslots&showAllReviews=true'
uclient = urlopen(my_url)
page_html = uclient.read()
uclient.close()
page_soup = soup(page_html, "html.parser")
reviews = page_soup.findAll("div",{"class":'d15Mdf bAhLNe'})
len(reviews)

The output is 0.

What should I do in order to fix this?

Upvotes: 0

Views: 76

Answers (2)

QHarr
QHarr

Reputation: 84465

You need to scroll to get all reviews which requires browser automation e.g. selenium (the POST request which does batch updates doesn't look easy to copy.

If you only want the page 1 , before scroll, reviews you can regex them out (my regex isn't good enough to get in one go)

import requests
import re

url = "https://play.google.com/store/apps/details?id=com.playstudios.popslots&showAllReviews=true"
r = requests.get(url)
p = re.compile(r'gp:AOqpTOH5kmss3scHG0QoYWgIF-BGIBxKlo-1-KRNg2GEzHXfpccogYalrSCBLbjLp-Y4h-T69r-4nFVYuea8Zg",(.*)\);</script><script aria-hidden="true"', re.DOTALL)
data = p.findall(r.text)[0]
p2 = re.compile(r'"(.*?)",|\d{21}')
items = p2.findall(data)
x = 0
for i in items:
    if re.search(r'(\d{21})', i):
        #print(i)
        print( items[x-2], ' : ' , items[x-1])
    x+=1

Upvotes: 0

9000
9000

Reputation: 40894

Because the class you're looking for is not there.

curl 'https://play.google.com/store/apps/details?id=com.playstudios.popslots&showAllReviews=true' | grep 'd15Mdf bAhLNe'

Almost entire <body> is produced by JavaScript running in the browser, including, I suppose all the interesting bits you're looking for.

If you want to try and scrape such a page, look for scrapers that actually run JavaScript (usually in Chrome running in headless mode).

Upvotes: 1

Related Questions