Reputation: 79
I am trying to extract license information of pip packages from pypi and then load into pandas dataframe. I did an example before to load a list comprehensions to PD. But I am not able to figure out this one...
so far, I have written.
from requests import get
import pandas as pd
import pip
url = 'https://pypi.python.org/pypi'
# packages_list = ['numpy','twisted']
installed_packages = pip.get_installed_distributions()
installed_packages_list = sorted(["%s==%s" % (i.key, i.version)
for i in installed_packages])
packages = []
licenses = []
summarys = []
for index, package in enumerate(installed_packages_list):
package = package.split("==")[0]
full_url = url+'/'+ package +'/json'
#print 'url is ' + full_url
page = get(url+'/'+package+'/json').json()
#print 'Package: ' + package + ', license is:' + page['info']['license'] + '. ' + page['info']['summary']
packages.append(package)
licenses.append(page['info']['license'])
summarys.append(page['info']['summary'])
print packages
pd_packages = pd.DataFrame(
{
"packages":[packages],
"licenses":[licenses],
"summarys":[summarys]
})
print pd_packages
Upvotes: 1
Views: 134
Reputation: 210832
Try this:
def get_pkg_info(pkg, url_pat='https://pypi.python.org/pypi/{}/json'):
r = requests.get(url_pat.format(pkg))
if r.status_code != requests.codes.ok:
return [pkg, None, None]
d = r.json()
if d and 'info' in d:
return [pkg, d['info'].get('license'), d['info'].get('summary')]
else:
return [pkg, None, None]
data = [get_pkg_info(x.split('==')[0]) for x in installed_packages_list]
df = pd.DataFrame(data, columns=['package','license','summary'])
Demo:
In [166]: pd.options.display.max_rows = 15
In [167]: df = pd.DataFrame(data, columns=['package','license','summary'])
In [168]: df
Out[168]:
package license summary
0 alabaster None A configurable sidebar-enabled Sphinx theme
1 anaconda-client UNKNOWN Anaconda Cloud command line client library
2 anaconda-navigator Proprietary
3 anaconda-project None None
4 asn1crypto MIT Fast ASN.1 parser and serializer with definiti...
5 astroid LGPL A abstract syntax tree for Python with inferen...
6 astropy BSD Community-developed python astronomy tools
.. ... ... ...
216 xarray Apache N-D labeled arrays and datasets in Python
217 xlrd BSD Library for developers to extract data from Mi...
218 xlsxwriter BSD A Python module for creating Excel XLSX files.
219 xlwings BSD 3-clause Make Excel fly: Interact with Excel from Pytho...
220 xlwt BSD Library to create spreadsheet files compatible...
221 xmltodict MIT Makes working with XML feel like you are worki...
222 yapsy BSD Yet another plugin system
[223 rows x 3 columns]
Upvotes: 2
Reputation: 8483
I think the issue stems from the creation of your DataFrame (pd_packages). packages, licenses and summarys are already lists so doing this [packages]
makes it a list of lists which explains the output in your comment below.
So instead of this
pd_packages = pd.DataFrame(
{
"packages":[packages],
"licenses":[licenses],
"summarys":[summarys]
})
Try this
pd.DataFrame(
{
"packages":packages,
"licenses":licenses,
"summarys":summarys
})
Upvotes: 0