Reputation: 93
In order to pull out reviews from google store, I am trying to learn the library beautiful soup. I wrote a code that should get me all reviews (including the star rating, date and name of reviewer) but the output is just an empty list. The problem is probably something very basic that I am just too inexperienced to know of.
from urllib.request import urlopen
from bs4 import BeautifulSoup as soup
my_url = 'https://play.google.com/store/apps/details?id=com.playstudios.popslots&showAllReviews=true'
uclient = urlopen(my_url)
page_html = uclient.read()
uclient.close()
page_soup = soup(page_html, "html.parser")
reviews = page_soup.findAll("div",{"class":'d15Mdf bAhLNe'})
len(reviews)
The output is 0.
What should I do in order to fix this?
Upvotes: 0
Views: 76
Reputation: 84465
You need to scroll to get all reviews which requires browser automation e.g. selenium (the POST request which does batch updates doesn't look easy to copy.
If you only want the page 1 , before scroll, reviews you can regex them out (my regex isn't good enough to get in one go)
import requests
import re
url = "https://play.google.com/store/apps/details?id=com.playstudios.popslots&showAllReviews=true"
r = requests.get(url)
p = re.compile(r'gp:AOqpTOH5kmss3scHG0QoYWgIF-BGIBxKlo-1-KRNg2GEzHXfpccogYalrSCBLbjLp-Y4h-T69r-4nFVYuea8Zg",(.*)\);</script><script aria-hidden="true"', re.DOTALL)
data = p.findall(r.text)[0]
p2 = re.compile(r'"(.*?)",|\d{21}')
items = p2.findall(data)
x = 0
for i in items:
if re.search(r'(\d{21})', i):
#print(i)
print( items[x-2], ' : ' , items[x-1])
x+=1
Upvotes: 0
Reputation: 40894
Because the class you're looking for is not there.
curl 'https://play.google.com/store/apps/details?id=com.playstudios.popslots&showAllReviews=true' | grep 'd15Mdf bAhLNe'
Almost entire <body>
is produced by JavaScript running in the browser, including, I suppose all the interesting bits you're looking for.
If you want to try and scrape such a page, look for scrapers that actually run JavaScript (usually in Chrome running in headless mode).
Upvotes: 1