Reputation:
I am new to programming and I have recently tried to scrape google play reviews with python using the following program:
from bs4 import BeautifulSoup
import urllib.request
url = input("Enter URL: ")
open_url = urllib.request.urlopen(url)
soup = BeautifulSoup(open_url, "html.parser")
reviews = []
for i in soup.find_all("div", {"jscontroller" : "X"}, {"class" : "X"}):
per_review = i.find("X")
reviews.append(per_review)
print(reviews)
The problem is in this section:
for i in soup.find_all("div", {"jscontroller" : "X"}, {"class" : "X"}):
per_review = i.find("X")
reviews.append(per_review)
I have tried with many parent nodes and the current nodes containing the reviews but the output is always an empty list. Could somebody demonstrate how to achieve what i was intending to? Thanks.
Edit
For example, if I use the URL for Super Mario Run with the following parameters:
reviews = []
for i in soup.find_all("div", {"jscontroller" : "LVJlx"}, {"class" : "UD7Dzf"}):
per_review = i.find("span")
reviews.append(per_review)
print(reviews)
The output is an empty list.
Upvotes: 1
Views: 577
Reputation: 649
The jscontroller
and class
values won't be consistent across different URLS. You could try something like
soup.find_all('div', {'jscontroller': True})
But that will not give you all the reviews as they are dynamically added when you scroll down the page.
That means you need to scrape the page with an actual browser or you can try to reverse engineer the API calls by using Developer Tools.
e.g.
Upvotes: 1