user13215247
user13215247

Reputation:

Scraping google play reviews

I am new to programming and I have recently tried to scrape google play reviews with python using the following program:

from bs4 import BeautifulSoup
import urllib.request

url = input("Enter URL: ")
open_url = urllib.request.urlopen(url)

soup = BeautifulSoup(open_url, "html.parser")

reviews = []
for i in soup.find_all("div", {"jscontroller" : "X"}, {"class" : "X"}):
    per_review = i.find("X")
    reviews.append(per_review)

print(reviews)  

The problem is in this section:

for i in soup.find_all("div", {"jscontroller" : "X"}, {"class" : "X"}):
    per_review = i.find("X")
    reviews.append(per_review) 

I have tried with many parent nodes and the current nodes containing the reviews but the output is always an empty list. Could somebody demonstrate how to achieve what i was intending to? Thanks.

Edit

For example, if I use the URL for Super Mario Run with the following parameters:

reviews = []
for i in soup.find_all("div", {"jscontroller" : "LVJlx"}, {"class" : "UD7Dzf"}):
    per_review = i.find("span")
    reviews.append(per_review)

print(reviews)    

The output is an empty list.

Upvotes: 1

Views: 577

Answers (1)

NomadMonad
NomadMonad

Reputation: 649

The jscontroller and class values won't be consistent across different URLS. You could try something like

soup.find_all('div', {'jscontroller': True}) 

But that will not give you all the reviews as they are dynamically added when you scroll down the page.

That means you need to scrape the page with an actual browser or you can try to reverse engineer the API calls by using Developer Tools.

e.g.

enter image description here

Upvotes: 1

Related Questions