Reputation: 33655
On Linux Debian, how can I list all installed python pip packages and the size (amount of disk space used) that each one takes up?
Upvotes: 90
Views: 89956
Reputation: 11
Here is code to return total size of Python packags in [MB] with individual package size:
import pkg_resources
def calc_container(path):
total_size = 0
for dirpath, dirnames, filenames in os.walk(path):
for f in filenames:
fp = os.path.join(dirpath, f)
total_size += os.path.getsize(fp)
return total_size
dists = [d for d in pkg_resources.working_set]
total_size = 0
for dist in dists:
try:
path = os.path.join(dist.location, dist.project_name)
size = calc_container(path)
total_size += size
if size / (1024*1024) > 1.0:
print(f"{dist}: {size / (1024*1024):.2f} MB")
print("-" * 40)
except OSError:
print(f"{dist.project_name} no longer exists")
print("Total size of installed packages:", total_size / (1024*1024), "MB")
Upvotes: -1
Reputation: 460
I like @Tirtha's solution. Here's my upgraded version that takes the path to a requirements.txt
as an optional argument and only shows the sizes of the packages contained therein.
Useful if you want to know the size of dependencies for a specific project.
import os
import sys
import pkg_resources
from numpy import loadtxt
# Usage: python3 pipsize.py [requirements.txt]
if len(sys.argv) == 2:
with open(sys.argv[1], 'r') as file:
requirements = file.read().splitlines()
else:
requirements = []
def calc_container(path):
total_size = 0
for dirpath, dirnames, filenames in os.walk(path):
for f in filenames:
fp = os.path.join(dirpath, f)
total_size += os.path.getsize(fp)
return total_size
dists = [d for d in pkg_resources.working_set]
for dist in dists:
if requirements:
if dist.project_name not in requirements:
continue
try:
path = os.path.join(dist.location, dist.project_name)
size = calc_container(path)
if size/1000 > 1.0:
print (f"{dist}: {size/1000} KB")
print("-"*40)
except OSError:
print(f"{dist.project_name} no longer exists")
Upvotes: 0
Reputation: 1737
A modified version of Marko Kohtala's answer:
One-liner:
python -c "for d in __import__('importlib.metadata').metadata.distributions(): print('{:>12.3f} KiB {}'.format(sum(0 if not f.locate().is_file() else f.locate().stat().st_size for f in d.files) / 1024, d.name))"
The same, but more readable:
import importlib.metadata
for d in importlib.metadata.distributions():
d_size = 0
for f in d.files:
if f.locate().is_file():
d_size += f.locate().stat().st_size
print('{:>12.3f} KiB {}'.format(d_size/1024, d.name))
Example output:
60.752 KiB multipledispatch
318.895 KiB natsort
64329.371 KiB numpy
288.076 KiB packaging
54892.789 KiB pandas
28.006 KiB pandas-flavor
7185.510 KiB pip
77101.011 KiB pyarrow
1088.491 KiB pyjanitor
644.466 KiB python-dateutil
1033.665 KiB pytz
147559.953 KiB scipy
3810.577 KiB setuptools
64.252 KiB six
303.010 KiB tabulate
572.733 KiB tzdata
523.449 KiB wheel
9488.667 KiB xarray
Motivation for this modification:
st_size
(size in bytes) instead of st_blocks
(size taken on disk)Upvotes: 6
Reputation: 429
New version for new pip list format:
pip2 list --format freeze \
|awk -F = {'print $1'} \
| xargs pip2 show \
| grep -E 'Location:|Name:' \
| cut -d ' ' -f 2 \
| paste -d ' ' - - \
| awk '{print $2 "/" tolower($1)}' \
| xargs du -sh \
2> /dev/null \
|sort -h
Upvotes: 29
Reputation: 4670
Could please try this one(A bit long though, maybe there are better solutions):
$ pip list \
| xargs pip show \
| grep -E 'Location:|Name:' \
| cut -d ' ' -f 2 \
| paste -d ' ' - - \
| awk '{print $2 "/" tolower($1)}' \
| xargs du -sh \
2> /dev/null
the output should look like this:
80K /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages/blinker
3.8M /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages/docutils
296K /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages/ecdsa
340K /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages/execnet
564K /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages/fabric
1.4M /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages/flask
316K /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages/httplib2
1.9M /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages/jinja2
...
should works if the package is installed in Location/Name
. (location and name are from pip show <package>
)
pip show <package>
will show you the location:
---
Metadata-Version: 2.0
Name: Flask
Version: 0.10.1
Summary: A microframework based on Werkzeug, Jinja2 and good intentions
Home-page: http://github.com/mitsuhiko/flask/
Author: Armin Ronacher
Author-email: [email protected]
License: BSD
Location: /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages
Requires: itsdangerous, Werkzeug, Jinja2
we get the Name
and Location
to join them to get the location, finally use du -sh
to get the package size.
Upvotes: 41
Reputation: 17812
Modified for pip version 18 and above:
pip list \
| tail -n +3 \
| awk '{print $1}' \
| xargs pip show \
| grep -E 'Location:|Name:' \
| cut -d ' ' -f 2 \
| paste -d ' ' - - \
| awk '{print $2 "/" tolower($1)}' \
| xargs du -sh 2> /dev/null \
| sort -hr
This command shows pip packages, sorted by descending order of sizes.
Upvotes: 95
Reputation: 152
Building on @Tirtha and @AnsonH answers, here is my version:
It features:
# Run `python pipsize.py` in Terminal to show size of pip packages
# Credits: https://stackoverflow.com/a/67914559/11067496
# Credits: https://gist.github.com/AnsonH/fd634ba4298376f2abd8e00f99b01be8
import os
import pkg_resources
sort_in_descending = True # Show packages in descending order
def calc_container(path):
total_size = 0
for dirpath, _, filenames in os.walk(path):
for f in filenames:
fp = os.path.join(dirpath, f)
total_size += os.path.getsize(fp)
return total_size
total_size = 0
max_size = 0
max_dist_length = 0
dists = [d for d in pkg_resources.working_set]
dists_with_size = {}
for dist in dists:
try:
max_dist_length = max(max_dist_length, len(str(dist)))
path = os.path.join(dist.location, dist.project_name)
size = calc_container(path)
total_size += size
max_size = max(max_size, size)
dists_with_size[size] = dist
except OSError:
'{} no longer exists'.format(dist.project_name)
# Sort packages size
dists_with_size = dict(sorted(dists_with_size.items(), reverse=sort_in_descending))
def str_spacer(name: str, max_len: int = max_dist_length) -> str:
n_spaces = max_len - len(str(name))
return f"{n_spaces * ' '}"
def human_readable_size(size: int, decimal_places: int = 2, max_unit: str = "PiB"):
units = ['B', 'KiB', 'MiB', 'GiB', 'TiB', 'PiB']
if max_unit not in units:
raise ValueError(f"specified max unit not in available units. Available units: {units}")
for unit in units:
if size < 1024.0 or unit == max_unit:
break
size /= 1024.0
return f"{size:.{decimal_places}f} {unit}"
def table_printer(text: str, size: int):
print(f"{text} {str_spacer(text)}{human_readable_size(size, max_unit='MiB')}")
# print total statement
table_printer("TOTAL", total_size)
max_size_text = human_readable_size(max_size, max_unit="MiB")
print("=" * (1 + max_dist_length + len(max_size_text)))
# print size for each distro
count_small_libs = 0
small_lib_size = 0
for size, dist in dists_with_size.items():
if size/1000000 > 1.0:
table_printer(dist, size)
else:
count_small_libs += 1
small_lib_size += size
# print remaining size for small distros
small_lib_text = f"{count_small_libs} libs smaller than 1.0 MB"
print()
table_printer(small_lib_text, small_lib_size)
Running the script in python outputs:
TOTAL 1341.58 MiB
==========================================
kaleido 0.2.1 253.34 MiB
torch 1.13.0 232.95 MiB
scipy 1.8.1 93.77 MiB
pyarrow 10.0.0 81.60 MiB
safetensors 0.4.1 1.14 MiB
fsspec 2023.12.2 1.08 MiB
coverage 7.4.0 1.05 MiB
pyod 1.1.2 1.03 MiB
pycparser 2.21 1001.23 KiB
92 libs smaller than 1.0 MB 27.70 MiB
Upvotes: 0
Reputation: 931
Starting with Python 3.10 you can get the on-disk sizes of installed Python packages using a script like
import importlib.metadata
for d in importlib.metadata.distributions():
print(sum(f.locate().stat().st_blocks*512 for f in d.files), d.name)
Or from command line on single line
python -c 'for d in __import__("importlib.metadata").metadata.distributions(): print(sum(f.locate().stat().st_blocks*512 for f in d.files), d.name)
It works starting Python 3.8 if you replace d.name
with d.metadata['Name']
.
Upvotes: 1
Reputation: 21
On Mac, I navigate to the site-packages
folder and do
du -h -d 1 | sort -rh | grep -v "dist-info"
On linux you need --max-depth 1
instead of -d 1
. But I think that should work.
Upvotes: 2
Reputation: 2065
Here's how,
pip3 show numpy | grep "Location:"
du -h path/to/all/packages
Note: You may put any package name in place of numpy
Upvotes: 5
Reputation: 1746
Go to the package site to find the size e.g. https://pypi.python.org/pypi/pip/json
Then expand releases
, find the version, and look up the size
(in bytes).
Upvotes: 18
Reputation: 3034
You can just run part 1 by it's self for all the current packages python tool-size.py
will total them all up for you
If you want to know the exact size of a particular pip package including all its dependencies, i've created a little bash and python combo to achieve this
( based off the excellent package walking code answer above https://stackoverflow.com/a/67914559/3248788 )
create a python script called tool-size.py
#!/usr/bin/env python
import os
import pkg_resources
def calc_container(path):
total_size = 0
for dirpath, dirnames, filenames in os.walk(path):
for f in filenames:
fp = os.path.join(dirpath, f)
total_size += os.path.getsize(fp)
return total_size
def calc_installed_sizes():
dists = [d for d in pkg_resources.working_set]
total_size = 0
print (f"Size of Dependencies")
print("-"*40)
for dist in dists:
# ignore pre-installed pip and setuptools
if dist.project_name in ["pip", "setuptools"]:
continue
try:
path = os.path.join(dist.location, dist.project_name)
size = calc_container(path)
total_size += size
if size/1000 > 1.0:
print (f"{dist}: {size/1000} KB")
print("-"*40)
except OSError:
'{} no longer exists'.format(dist.project_name)
print (f"Total Size (including dependencies): {total_size/1000} KB")
if __name__ == "__main__":
calc_installed_sizes()
create a bash script called tool-size.sh
#!/usr/bin/env bash
# uncomment to to debug
# set -x
rm -rf ~/.virtualenvs/tool-size-tester
python -m venv ~/.virtualenvs/tool-size-tester
source ~/.virtualenvs/tool-size-tester/Scripts/activate
pip install -q $1
python tool-size.py
deactivate
run script with package you want to get the size of
tool-size.sh xxx
say for truffleHog3
$ ./tool-size.sh truffleHog3
Size of Dependencies
----------------------------------------
truffleHog3 2.0.6: 56.46 KB
----------------------------------------
smmap 4.0.0: 108.808 KB
----------------------------------------
MarkupSafe 2.0.1: 40.911 KB
----------------------------------------
Jinja2 3.0.1: 917.551 KB
----------------------------------------
gitdb 4.0.7: 320.08 KB
----------------------------------------
Total Size (including dependencies): 1443.81 KB
Upvotes: 1
Reputation: 738
There is a simple Pythonic way to find it out though.
Here is the code. Let's call this file pipsize.py
.
import os
import pkg_resources
def calc_container(path):
total_size = 0
for dirpath, dirnames, filenames in os.walk(path):
for f in filenames:
fp = os.path.join(dirpath, f)
total_size += os.path.getsize(fp)
return total_size
dists = [d for d in pkg_resources.working_set]
for dist in dists:
try:
path = os.path.join(dist.location, dist.project_name)
size = calc_container(path)
if size/1000 > 1.0:
print (f"{dist}: {size/1000} KB")
print("-"*40)
except OSError:
'{} no longer exists'.format(dist.project_name)
When run with python pipsize.py
this will print out something like,
pip 21.1.2: 8651.906 KB
----------------------------------------
numpy 1.20.3: 25892.871 KB
----------------------------------------
numexpr 2.7.3: 1627.361 KB
----------------------------------------
zict 2.0.0: 48.54 KB
----------------------------------------
yarl 1.6.3: 1395.888 KB
----------------------------------------
widgetsnbextension 3.5.1: 4609.962 KB
----------------------------------------
webencodings 0.5.1: 54.768 KB
----------------------------------------
wcwidth 0.2.5: 452.214 KB
----------------------------------------
uvicorn 0.14.0: 257.515 KB
----------------------------------------
tzlocal 2.1: 67.11 KB
----------------------------------------
traitlets 5.0.5: 800.71 KB
----------------------------------------
tqdm 4.61.0: 289.412 KB
----------------------------------------
tornado 6.1: 2898.264 KB
Upvotes: 25
Reputation: 73
$ du -h -d 1 "$(pip -V | cut -d ' ' -f 4 | sed 's/pip//g')" | grep -vE "dist-info|_distutils_hack|__pycache__" | sort -h
No need to convert these:
case (Django:django)
hyphen (django-q:django_q)
naming (djangorestframework-gis:rest_framework_gis)
Dependencies and some unknown directories revealed as well...
Upvotes: 6
Reputation: 544
All of the above solutions do not list packages with dashes in them: PIP converts them to underscores in the folder names:
pip list --format freeze | awk -F = {'print $1'} | xargs pip show | grep -E 'Location:|Name:' | cut -d ' ' -f 2 | paste -d ' ' - - | awk '{gsub("-","_",$1); print $2 "/" tolower($1)}' | xargs du -sh 2> /dev/null | sort -h
And for Mac users:
pip3 list --format freeze | awk -F = {'print $1'} | xargs pip3 show | grep -E 'Location:|Name:' | cut -d ' ' -f 2 | paste -d ' ' - - | awk '{gsub("-","_",$1); print $2 "/" tolower($1)}' | xargs du -sh 2> /dev/null | sort -h
Upvotes: 8
Reputation: 9762
History :
There is no command or applications developed for that purpose at the moment, we need to check that manually
Manual Method I :
du /usr/lib/python3.5/ --max-depth=2 | sort -h
du /usr/lib64/python3.5/ --max-depth=2 | sort -h
This does not include packages/files installed out of that directory, thus said we will get 95% with those 2 simples command
Also if you have other version of python installed, you need to adapt the directory
Manual Method II :
pip list | sed '/Package/d' | sed '/----/d' | sed -r 's/\S+//2' | xargs pip show | grep -E 'Location:|Name:' | cut -d ' ' -f 2 | paste -d ' ' - - | awk '{print $2 "/" $(find $2 -maxdepth 1 -iname $1)}' | xargs du -sh | sort -h
Search the install directory with the package name with case insensitive
Manual Method II Alternative I :
pip list | sed '/Package/d' | sed '/----/d' | sed -r 's/\S+//2' | xargs pip show | grep -E 'Location:|Name:' | cut -d ' ' -f 2 | paste -d ' ' - -| awk '{print $2 "/" tolower($1)}' | xargs du -sh | sort -h
Search the install directory with the package name with lowered case
Manual Method II Alternative II :
pip list | sed '/Package/d' | sed '/----/d' | sed -r 's/\S+//2' | xargs pip show | grep -E 'Location:|Name:' | cut -d ' ' -f 2 | paste -d ' ' - -| awk '{print $2 "/" $1}' | xargs du -sh | sort -h
Search the install directory with the package name
Note :
For methods using du
, output lines starting with du: cannot access
need to be checked manually;
The command use the install directory and add to it the name of the package but some times the package name and directory name are different...
Make it simple :
Upvotes: 3