Reputation: 23
I am working on a python project where I need to find out what are the apps that the company owns. For example, I have a list:
company_name = ['Airbnb', 'WeFi']
I would like to write a python function/ program to do the following:
1 . have it automatically search item in the list in Play store
2 . if the company name match,even if it only matches the first name, eg "Airbnb" will match "Airbnb,inc"
If the company has more than one app, it will do the same for all apps.
each app information of the company is store in tuple = {app name, category}
Desired end result will be a list of tuples
eg:
print(company_name[0])
print(type(company_name[0]))
outcome:
airbnb
tuple
print(company_name[0][0])
outcome:
[('airbnb','Travel')]
This is a mixed of many knowledge and I am a newbie to python. So please give me some direction of how should I start writing the code.
I learn selenium could do automate "load more" function but I am not sure what exactly package I could use?
Upvotes: 1
Views: 1504
Reputation: 33974
Here's another option to search google play programmatically:
https://github.com/facundoolano/google-play-scraper/#list
var gplay = require('google-play-scraper');
gplay.list({
category: gplay.category.GAME_ACTION,
collection: gplay.collection.TOP_FREE,
num: 2
})
.then(console.log, console.log);
(it's nodejs, not python though)
Upvotes: 0
Reputation: 1052
I've written a little demo that may help you to achieve your goal. I used requests and Beautiful Soup. It's not exactly what you wanted but it can be adapted easily.
import requests
import bs4
company_name = "airbnb"
def get_company(company_name):
r = requests.get("https://play.google.com/store/search?q="+company_name)
soup = bs4.BeautifulSoup(r.text, "html.parser")
subtitles = soup.findAll("a", {'class':"subtitle"})
dev_urls = []
for title in subtitles:
try:
text = title.attrs["title"].lower()
#Sometimes there is a subtitle without any text on GPlay
#Catchs the error
except KeyError:
continue
if company_name in text:
url = "https://play.google.com" + title.attrs["href"]
dev_urls.append(url)
return dev_urls
def get_company_apps_url(dev_url):
r = requests.get(dev_url)
soup = bs4.BeautifulSoup(r.text, "html.parser")
titles = soup.findAll("a", {"class":"title"})
return ["https://play.google.com"+title.attrs["href"] for title in titles]
def get_app_category(app_url):
r = requests.get(app_url)
soup = bs4.BeautifulSoup(r.text, "html.parser")
developer_name = soup.find("span", {"itemprop":"name"}).text
app_name = soup.find("div", {"class":"id-app-title"}).text
category = soup.find("span", {"itemprop":"genre"}).text
return (developer_name, app_name, category)
dev_urls = get_company("airbnb")
apps_urls = get_company_apps_url(dev_urls[0])
get_app_category(apps_urls[0])
>>> get_company("airbnb")
['https://play.google.com/store/apps/developer?id=Airbnb,+Inc']
>>> get_company_apps_url("https://play.google.com/store/apps/developer?id=Airbnb,+Inc")
['https://play.google.com/store/apps/details?id=com.airbnb.android']
>>> get_app_category("https://play.google.com/store/apps/details?id=com.airbnb.android")
('Airbnb, Inc', 'Airbnb', 'Travel & Local')
My script with google
dev_urls = get_company("google")
apps_urls = get_company_apps_url(dev_urls[0])
for app in apps_urls:
print(get_app_category(app))
('Google Inc.', 'Google Duo', 'Communication')
('Google Inc.', 'Google Translate', 'Tools')
('Google Inc.', 'Google Photos', 'Photography')
('Google Inc.', 'Google Earth', 'Travel & Local')
('Google Inc.', 'Google Play Games', 'Entertainment')
('Google Inc.', 'Google Calendar', 'Productivity')
('Google Inc.', 'YouTube', 'Media & Video')
('Google Inc.', 'Chrome Browser - Google', 'Communication')
('Google Inc.', 'Google Cast', 'Tools')
('Google Inc.', 'Google Sheets', 'Productivity')
Upvotes: 3