Reputation: 37721
I am modifying the play-scraper API to scrape play-store app details. It uses BeautifulSoup
to parse HTML pages [reference].
I am particularly interested in all the additional information available for an app as shown in the screenshot below. (The above screenshot is taken from this app.)
I am stuck at extracting the list of permissions that an app asks for (shown in the above figure) because the View details
URL under Permissions
is as follows.
<a class="hrTbp" jsname="Hly47e">View details</a>
Clicking the View details
URL shows a list of permissions (screenshot as follows) that I want to extract.
I am not familiar with Javascript. Any help would be appreciated.
Upvotes: 0
Views: 131
Reputation: 2375
If I understand the question correctly you are trying to scrape the data from a modal. And when the website loads for the first time these modals data aren't available inside html. They are fetched after you click the view details button. That's why the parser doesn't get the data inside the modal, in your case the permission informations. So this is the reason of your problem.
Now about the solution, one possible solution could be achieved by using the Selenium and chromedriver by performing click event on the view details text and then fetching the modal data. Have a look at this link to get an idea.
Update: To get an idea about the solution using Selenium and chromedriver consider the following code:
options = Options()
options.headless = True
driver = webdriver.Chrome('local_path_to_chrome_driver', options=options)
driver.get(url_of_the_play_store_app)
time.sleep(5) #sleep for 5 secs sometime to fetch the data
driver.find_element_by_link_text("View details").click() #performing the click event
time.sleep(5) # again sleep for 5 secs to fetch the modal data
soup = BeautifulSoup(driver.page_source, "lxml")
The soup variable now has the updated scraped data including the modal window data and you can retrieve the modal window data from soup.
Upvotes: 2