Reputation: 480
I would like to find the most downloaded apps from the previous month in Google Play store(or I can make do with itunes too actually), and perform further exploratory data analysis with the data on these apps.
Python is familiar to me. But I don't know what approach is used to achieve this.
Upvotes: 0
Views: 522
Reputation: 46
To scrape top apps/games/movies/books from the Google Play Store with specific criteria (such as free, growing, or paid), you might consider using the SearchApi. It provides the top 200 products in a single request.
Below is a Python code example for retrieving the top 200 free apps from the Google Play Store:
import requests
payload = {
'api_key': 'your_api_key',
'engine': 'google_play_store',
'store': 'apps',
'chart': 'topselling_free'
}
response = requests.get('https://www.searchapi.io/api/v1/search', params=payload)
print(response.text)
Here's an example of single app JSON result:
{
"title":"Threads, an Instagram app",
"product_id":"com.instagram.barcelona",
"link":"https://play.google.com/store/apps/details?id=com.instagram.barcelona",
"rating":3.3,
"author":"Instagram",
"category":"Social",
"description":"Say more with Threads — Instagram’s text-based conversation app.Threads is where communities come together to discuss everything from the topics you care about today to what’ll be trending tomorrow. Whatever it is you’re interested in, you can follow and connect directly with your favorite creators and others who love the same things — or build a loyal following of your own to share your ideas, opinions and creativity with the world.A few things you can do on Threads…■ Access your Instagram followersYour Instagram username and verification badge are reserved for you. Automatically follow the same accounts you follow on Instagram in a few taps, and discover new accounts too.■ Share your point of viewSpin up a new thread to express what's on your mind. This is your space to be yourself, and you control who can reply.■ Connect with friends and your favorite creatorsJump to the replies to get in on the action and react to commentary, humor and insight from the creators you know and love. Find your community and connect with people who care about whatever it is you’re interested in.■ Control the conversationCustomize your settings and use controls to manage who can see your content, reply to your threads, or mention you. Accounts you’ve blocked will carry over from Instagram, and we’re enforcing the same Community Guidelines to help ensure everyone interacts safely and authentically. ■ Find ideas and inspirationFrom TV recommendations to career advice, get answers to your questions or learn something new from crowd-sourced conversations, thought leaders and industry experts.■ Never miss a momentStay on top of the latest trends and live events. Whether it’s about new music, movie premieres, sports, games, TV shows, fashion, or the latest product releases, find discussions and receive notifications any time your favorite profiles start a new thread.■ Open social networking – coming soonIn the future, there will be ways to discover more content and reach wider audiences: we are planning features that allow you to search for, follow and interact with users on open, interoperable social networks that we believe can shape the future of the internet.Meta Terms: https://www.facebook.com/terms.phpThreads Supplemental Terms: https://help.instagram.com/769983657850450Meta Privacy Policy: https://privacycenter.instagram.com/policyThreads Supplemental Privacy Policy: https://help.instagram.com/515230437301944Instagram Community Guidelines: https://help.instagram.com/477434105621119",
"currency":"USD",
"free":true,
"thumbnail":"https://play-lh.googleusercontent.com/G6jK9S77RN0laf9_6nhDo3AVxbRP9SgMmt8ZmQjKQ2hibn9xhOY-W5YFn_7stJD1CA",
"images":[
"https://play-lh.googleusercontent.com/4G1LubN-8kcV2zRU45ovPAmuesvS8ZGjB5ecyuNUzPgA72kG41RGHnptfFVHq-vp21BN",
"https://play-lh.googleusercontent.com/Fj_yQloJTOWcKTgVMrCMAWOZttmBDDRprbc0q6DpW8twqcr_2-EmpVEH1yQOpHu1hok",
"https://play-lh.googleusercontent.com/_NPxTj5BOQvyHtu9rPXSQIEU6KO7dQM1xI07E3AO5QCUpxxxQy3NVStDm-wW7feljNdd",
"https://play-lh.googleusercontent.com/TVA2tTHu52K4GHJ1QfknT-plgx5e85BihToslkUBH8Mb1msvBQfTBf1P5tFyceEqGXg",
"https://play-lh.googleusercontent.com/5IMjAtNfmq9l6cbyOg7ZIJsAF6NITdjigcToKP4mughPgOAuGrUKI_YmTHxfgaDZpo4",
"https://play-lh.googleusercontent.com/N6KWQZXuvrY-qV-CBrgBfGYrw3ibCeOg1lLAACNtEK2O4dXJ4ImjNXIOPvF9cvBXcVQ"
]
}
Documentation: https://www.searchapi.io/docs/google-play-store
Disclaimer: I work for SearchApi
Upvotes: 0
Reputation: 99
In the new Google Play UI most downloaded apps are likely displayed on the home page, or in the Top Charts section.
The main difference is that it is a faster approach than using for example selenium
or playwright
web driver. The API will bypass blocks by Google, and you don't have to create a parser from scratch and maintain it if certain HTML elements will be changed in the fututre.
Retrieving data from multiple devices (phone, tablet, etc.) using the device
parameter (this will be a list of devices that we need to iterate over and assign dynamically before making a request to API):
devices = ['phone', 'tablet']
for device in devices:
params = {
"api_key": os.getenv("API_KEY"), # serpapi key, https://serpapi.com/manage-api-key
"device": device, # Parameter defines the device for sorting results
"engine": "google_play", # serpapi parser engine
"hl": "en", # language
"store": "apps", # Apps store
"gl": "us" # country of the search, US -> USA
}
Pagination is integrated into the code with a while
loop, which is infinite and will run until the loop exit condition is met. This will happen if there is no next page to load:
# if next page is there, grab it and pass to GoogleSearch()
# otherwise, stop.
if "next" in results.get("serpapi_pagination", {}):
search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination").get("next")).query)))
else:
apps_is_present = False
You can check code in online IDE.
from serpapi import GoogleSearch
from urllib.parse import urlsplit, parse_qsl
import json, os
devices = ['phone', 'tablet']
apps_data = []
for device in devices:
params = {
"api_key": os.getenv("API_KEY"), # serpapi key, https://serpapi.com/manage-api-key
"device": device, # Parameter defines the device for sorting results
"engine": "google_play", # serpapi parser engine
"hl": "en", # language
"store": "apps", # Apps store
"gl": "us" # country of the search, US -> USA
}
search = GoogleSearch(params) # where data extraction happens
index = 0
apps_is_present = True
while apps_is_present:
results = search.get_dict() # JSON -> Python dict
index += 1
for result in results.get("organic_results", []):
for app in result["items"]:
apps_data.append({
"page": index,
"title": app.get("title"),
"link": app.get("link"),
"product_id": app.get("product_id"),
"description": app.get("description"),
"rating": app.get("rating")
})
# if next page is there, grab it and pass to GoogleSearch()
# otherwise, stop.
if "next" in results.get("serpapi_pagination", {}):
search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination").get("next")).query)))
else:
apps_is_present = False
print(json.dumps(apps_data[:2], indent=2, ensure_ascii=False)) # prints first two elements from a list
Example output:
[
{
"page": 1,
"title": "Google Photos",
"link": "https://play.google.com/store/apps/details?id=com.google.android.apps.photos",
"product_id": "com.google.android.apps.photos",
"description": null,
"rating": 4.5
},
{
"page": 1,
"title": "Gmail",
"link": "https://play.google.com/store/apps/details?id=com.google.android.gm",
"product_id": "com.google.android.gm",
"description": null,
"rating": 4.2
}
]
There's a Scrape Google Play Search Apps in Python blog post if you need a little bit more code explanation.
Disclaimer, I work for SerpApi.
Upvotes: 0
Reputation: 2414
There is another alternative package which works for scraping google play store data, below is the link
https://pypi.org/project/google-play-scraper/
In case, your looking straight for sample code follow the below steps
from google_play_scraper import app
result = app(
'com.nianticlabs.pokemongo',
lang='en', # defaults to 'en'
country='us' # defaults to 'us'
)
print(result)
Upvotes: 1