Wasi Ahmad
Wasi Ahmad

Reputation: 37721

Retrieving text content from Javascript URL

I am modifying the play-scraper API to scrape play-store app details. It uses BeautifulSoup to parse HTML pages [reference].

I am particularly interested in all the additional information available for an app as shown in the screenshot below. (The above screenshot is taken from this app.)



I am stuck at extracting the list of permissions that an app asks for (shown in the above figure) because the View details URL under Permissions is as follows.

<a class="hrTbp" jsname="Hly47e">View details</a>

Clicking the View details URL shows a list of permissions (screenshot as follows) that I want to extract.



I am not familiar with Javascript. Any help would be appreciated.

Upvotes: 0

Views: 131

Answers (1)

Md Golam Rahman Tushar
Md Golam Rahman Tushar

Reputation: 2375

If I understand the question correctly you are trying to scrape the data from a modal. And when the website loads for the first time these modals data aren't available inside html. They are fetched after you click the view details button. That's why the parser doesn't get the data inside the modal, in your case the permission informations. So this is the reason of your problem.

Now about the solution, one possible solution could be achieved by using the Selenium and chromedriver by performing click event on the view details text and then fetching the modal data. Have a look at this link to get an idea.

Update: To get an idea about the solution using Selenium and chromedriver consider the following code:

options = Options()
options.headless = True
driver = webdriver.Chrome('local_path_to_chrome_driver', options=options)

driver.get(url_of_the_play_store_app)
time.sleep(5) #sleep for 5 secs sometime to fetch the data
driver.find_element_by_link_text("View details").click() #performing the click event
time.sleep(5) # again sleep for 5 secs to fetch the modal data
soup = BeautifulSoup(driver.page_source, "lxml")

The soup variable now has the updated scraped data including the modal window data and you can retrieve the modal window data from soup.

Upvotes: 2

Related Questions