Reputation: 2220
I am writing a program to retrieve the information related to reviews posted by the users on google play store. It includes reviewer name, rating of review, date of review, like or dislike of review, and text of the review. I am using BeautifulSoup for this purpose. I am facing trouble in retrieving above information. Let me explain by the following example: I want to retrieve review related information of following weblink:
https://play.google.com/store/apps/details?id=com.education.educationkids&hl=en&showAllReviews=true
Here is the code of my program:
import urllib.request
import bs4 as bs
html = urllib.request.urlopen('https://play.google.com/store/apps/details?id=com.education.educationkids&hl=en&showAllReviews=true').read()
soup = bs.BeautifulSoup(html, 'html.parser')
I want to retrieve the above mentioned information. When I inspect the element I found that div named "fk8dgd" contains all review related information (as shown in the picture).
In order to retrieve the text of the reviewer, I used the following command:
soup.find('div',{'jscontroller':'H6eOGe'}).get_text()
However, the command throw an error:
AttributeError: 'NoneType' object has no attribute 'get_text'
I am not sure where am I making the mistake. Could anyone help me out to fix the issue?
Upvotes: 2
Views: 814
Reputation:
The bad reason is that the html is drawn after the page is loaded through the browser.
This loads the page completely through selenium
and finds its contents in beautifulsoup
.
Here is code
import bs4 as bs
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://play.google.com/store/apps/details?id=com.education.educationkids&hl=en&showAllReviews=true')
# html = urllib.request.urlopen('https://play.google.com/store/apps/details?id=com.education.educationkids&hl=en&showAllReviews=true').read()
soup = bs.BeautifulSoup(driver.page_source, 'html.parser')
print(soup.find('div',{'jscontroller':'H6eOGe'}).get_text())
Upvotes: 3