user2293224
user2293224

Reputation: 2220

Python BeautifulSoup: Retrieving review related information from Google Play Store

I am writing a program to retrieve the information related to reviews posted by the users on google play store. It includes reviewer name, rating of review, date of review, like or dislike of review, and text of the review. I am using BeautifulSoup for this purpose. I am facing trouble in retrieving above information. Let me explain by the following example: I want to retrieve review related information of following weblink:

https://play.google.com/store/apps/details?id=com.education.educationkids&hl=en&showAllReviews=true

Here is the code of my program:

import urllib.request
import bs4 as bs
html = urllib.request.urlopen('https://play.google.com/store/apps/details?id=com.education.educationkids&hl=en&showAllReviews=true').read()
soup = bs.BeautifulSoup(html, 'html.parser')

I want to retrieve the above mentioned information. When I inspect the element I found that div named "fk8dgd" contains all review related information (as shown in the picture). enter image description here

In order to retrieve the text of the reviewer, I used the following command:

soup.find('div',{'jscontroller':'H6eOGe'}).get_text()

However, the command throw an error:

AttributeError: 'NoneType' object has no attribute 'get_text'

I am not sure where am I making the mistake. Could anyone help me out to fix the issue?

Upvotes: 2

Views: 814

Answers (1)

user12624957
user12624957

Reputation:

The bad reason is that the html is drawn after the page is loaded through the browser.

This loads the page completely through selenium and finds its contents in beautifulsoup.

Here is code

import bs4 as bs
from selenium import webdriver

driver = webdriver.Chrome()

driver.get('https://play.google.com/store/apps/details?id=com.education.educationkids&hl=en&showAllReviews=true')

# html = urllib.request.urlopen('https://play.google.com/store/apps/details?id=com.education.educationkids&hl=en&showAllReviews=true').read()
soup = bs.BeautifulSoup(driver.page_source, 'html.parser')

print(soup.find('div',{'jscontroller':'H6eOGe'}).get_text())

Upvotes: 3

Related Questions