Prometheus
Prometheus

Reputation: 33655

How to see sizes of installed pip packages?

On Linux Debian, how can I list all installed python pip packages and the size (amount of disk space used) that each one takes up?

Upvotes: 90

Views: 89956

Answers (16)

Banuprasad B
Banuprasad B

Reputation: 11

Here is code to return total size of Python packags in [MB] with individual package size:

import pkg_resources

def calc_container(path):
    total_size = 0
    for dirpath, dirnames, filenames in os.walk(path):
        for f in filenames:
            fp = os.path.join(dirpath, f)
            total_size += os.path.getsize(fp)
    return total_size

dists = [d for d in pkg_resources.working_set]
total_size = 0

for dist in dists:
    try:
        path = os.path.join(dist.location, dist.project_name)
        size = calc_container(path)
        total_size += size
        if size / (1024*1024) > 1.0:
            print(f"{dist}: {size / (1024*1024):.2f} MB")
            print("-" * 40)
    except OSError:
        print(f"{dist.project_name} no longer exists")

print("Total size of installed packages:", total_size / (1024*1024), "MB")

Upvotes: -1

shredEngineer
shredEngineer

Reputation: 460

I like @Tirtha's solution. Here's my upgraded version that takes the path to a requirements.txt as an optional argument and only shows the sizes of the packages contained therein.

Useful if you want to know the size of dependencies for a specific project.

import os
import sys
import pkg_resources
from numpy import loadtxt


# Usage:  python3 pipsize.py [requirements.txt]
if len(sys.argv) == 2:
    with open(sys.argv[1], 'r') as file:
        requirements = file.read().splitlines()
else:
    requirements = []


def calc_container(path):
    total_size = 0
    for dirpath, dirnames, filenames in os.walk(path):
        for f in filenames:
            fp = os.path.join(dirpath, f)
            total_size += os.path.getsize(fp)
    return total_size


dists = [d for d in pkg_resources.working_set]


for dist in dists:

    if requirements:
        if dist.project_name not in requirements:
            continue

    try:
        path = os.path.join(dist.location, dist.project_name)
        size = calc_container(path)
        if size/1000 > 1.0:
            print (f"{dist}: {size/1000} KB")
            print("-"*40)
    except OSError:
        print(f"{dist.project_name} no longer exists")

Upvotes: 0

Igor
Igor

Reputation: 1737

A modified version of Marko Kohtala's answer:
One-liner:

python -c "for d in __import__('importlib.metadata').metadata.distributions(): print('{:>12.3f} KiB   {}'.format(sum(0 if not f.locate().is_file() else f.locate().stat().st_size for f in d.files) / 1024, d.name))"

The same, but more readable:

import importlib.metadata
for d in importlib.metadata.distributions():
    d_size = 0
    for f in d.files:
        if f.locate().is_file():
            d_size += f.locate().stat().st_size
    print('{:>12.3f} KiB   {}'.format(d_size/1024, d.name))

Example output:

      60.752 KiB   multipledispatch
     318.895 KiB   natsort
   64329.371 KiB   numpy
     288.076 KiB   packaging
   54892.789 KiB   pandas
      28.006 KiB   pandas-flavor
    7185.510 KiB   pip
   77101.011 KiB   pyarrow
    1088.491 KiB   pyjanitor      
     644.466 KiB   python-dateutil
    1033.665 KiB   pytz
  147559.953 KiB   scipy
    3810.577 KiB   setuptools
      64.252 KiB   six       
     303.010 KiB   tabulate  
     572.733 KiB   tzdata
     523.449 KiB   wheel
    9488.667 KiB   xarray

Motivation for this modification:

  1. uses st_size (size in bytes) instead of st_blocks (size taken on disk)
  2. hence works of both Windows and Linux (python 3.10)
  3. resilient to missing files (personally, I run into them a lot)
  4. slightly better formatting

Upvotes: 6

Petr Mach
Petr Mach

Reputation: 429

New version for new pip list format:

pip2 list --format freeze \
   |awk -F = {'print $1'} \
   | xargs pip2 show \
   | grep -E 'Location:|Name:' \
   | cut -d ' ' -f 2 \
   | paste -d ' ' - - \
   | awk '{print $2 "/" tolower($1)}' \
   | xargs du -sh \
   2> /dev/null \
  |sort -h

Upvotes: 29

lord63. j
lord63. j

Reputation: 4670

Could please try this one(A bit long though, maybe there are better solutions):

$ pip list \
  | xargs pip show \
  | grep -E 'Location:|Name:' \
  | cut -d ' ' -f 2 \
  | paste -d ' ' - - \
  | awk '{print $2 "/" tolower($1)}' \
  | xargs du -sh \
  2> /dev/null

the output should look like this:

80K     /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages/blinker
3.8M    /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages/docutils
296K    /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages/ecdsa
340K    /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages/execnet
564K    /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages/fabric
1.4M    /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages/flask
316K    /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages/httplib2
1.9M    /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages/jinja2
...

should works if the package is installed in Location/Name. (location and name are from pip show <package>)


pip show <package> will show you the location:

---
Metadata-Version: 2.0
Name: Flask
Version: 0.10.1
Summary: A microframework based on Werkzeug, Jinja2 and good intentions
Home-page: http://github.com/mitsuhiko/flask/
Author: Armin Ronacher
Author-email: [email protected]
License: BSD
Location: /home/lord63/.pyenv/versions/2.7.11/envs/py2/lib/python2.7/site-packages
Requires: itsdangerous, Werkzeug, Jinja2

we get the Name and Location to join them to get the location, finally use du -sh to get the package size.

Upvotes: 41

jerrymouse
jerrymouse

Reputation: 17812

Modified for pip version 18 and above:

pip list \
  | tail -n +3 \
  | awk '{print $1}' \
  | xargs pip show \
  | grep -E 'Location:|Name:' \
  | cut -d ' ' -f 2 \
  | paste -d ' ' - - \
  | awk '{print $2 "/" tolower($1)}' \
  | xargs du -sh 2> /dev/null \
  | sort -hr

This command shows pip packages, sorted by descending order of sizes.

Upvotes: 95

Marco Bresson
Marco Bresson

Reputation: 152

Building on @Tirtha and @AnsonH answers, here is my version:

It features:

  • line showing the total space,
  • a line showing the space taken by all the small libraries,
  • a table-like formatting to display everything in decreasing order.
# Run `python pipsize.py` in Terminal to show size of pip packages
# Credits: https://stackoverflow.com/a/67914559/11067496
# Credits: https://gist.github.com/AnsonH/fd634ba4298376f2abd8e00f99b01be8

import os
import pkg_resources

sort_in_descending = True   # Show packages in descending order


def calc_container(path):
    total_size = 0
    for dirpath, _, filenames in os.walk(path):
        for f in filenames:
            fp = os.path.join(dirpath, f)
            total_size += os.path.getsize(fp)
    return total_size


total_size = 0
max_size = 0
max_dist_length = 0
dists = [d for d in pkg_resources.working_set]
dists_with_size = {}

for dist in dists:
    try:
        max_dist_length = max(max_dist_length, len(str(dist)))
        path = os.path.join(dist.location, dist.project_name)
        size = calc_container(path)
        total_size += size
        max_size = max(max_size, size)
        dists_with_size[size] = dist
    except OSError:
        '{} no longer exists'.format(dist.project_name)

# Sort packages size
dists_with_size = dict(sorted(dists_with_size.items(), reverse=sort_in_descending))


def str_spacer(name: str, max_len: int = max_dist_length) -> str:
    n_spaces = max_len - len(str(name))
    return f"{n_spaces * ' '}"


def human_readable_size(size: int, decimal_places: int = 2, max_unit: str = "PiB"):
    units = ['B', 'KiB', 'MiB', 'GiB', 'TiB', 'PiB']

    if max_unit not in units:
        raise ValueError(f"specified max unit not in available units. Available units: {units}")

    for unit in units:
        if size < 1024.0 or unit == max_unit:
            break
        size /= 1024.0

    return f"{size:.{decimal_places}f} {unit}"


def table_printer(text: str, size: int):
    print(f"{text} {str_spacer(text)}{human_readable_size(size, max_unit='MiB')}")


# print total statement
table_printer("TOTAL", total_size)
max_size_text = human_readable_size(max_size, max_unit="MiB")
print("=" * (1 + max_dist_length + len(max_size_text)))

# print size for each distro
count_small_libs = 0
small_lib_size = 0
for size, dist in dists_with_size.items():
    if size/1000000 > 1.0:
        table_printer(dist, size)
    else:
        count_small_libs += 1
        small_lib_size += size

# print remaining size for small distros
small_lib_text = f"{count_small_libs} libs smaller than 1.0 MB"
print()
table_printer(small_lib_text, small_lib_size)

Running the script in python outputs:

TOTAL                           1341.58 MiB
==========================================
kaleido 0.2.1                   253.34 MiB
torch 1.13.0                    232.95 MiB
scipy 1.8.1                     93.77 MiB
pyarrow 10.0.0                  81.60 MiB
safetensors 0.4.1               1.14 MiB
fsspec 2023.12.2                1.08 MiB
coverage 7.4.0                  1.05 MiB
pyod 1.1.2                      1.03 MiB
pycparser 2.21                  1001.23 KiB

92 libs smaller than 1.0 MB     27.70 MiB

Upvotes: 0

Marko Kohtala
Marko Kohtala

Reputation: 931

Starting with Python 3.10 you can get the on-disk sizes of installed Python packages using a script like

import importlib.metadata
for d in importlib.metadata.distributions():
    print(sum(f.locate().stat().st_blocks*512 for f in d.files), d.name)

Or from command line on single line

python -c 'for d in __import__("importlib.metadata").metadata.distributions(): print(sum(f.locate().stat().st_blocks*512 for f in d.files), d.name)

It works starting Python 3.8 if you replace d.name with d.metadata['Name'].

Upvotes: 1

Tyler Neill
Tyler Neill

Reputation: 21

On Mac, I navigate to the site-packages folder and do

du -h -d 1 | sort -rh | grep -v "dist-info"   

On linux you need --max-depth 1 instead of -d 1. But I think that should work.

Upvotes: 2

Samir Kape
Samir Kape

Reputation: 2065

Here's how,

  1. pip3 show numpy | grep "Location:"
  2. this will return path/to/all/packages
  3. du -h path/to/all/packages
  4. last line will contain size of all packages in MB

Note: You may put any package name in place of numpy

Upvotes: 5

JMzance
JMzance

Reputation: 1746

Go to the package site to find the size e.g. https://pypi.python.org/pypi/pip/json

Then expand releases, find the version, and look up the size (in bytes).

Upvotes: 18

aqm
aqm

Reputation: 3034

You can just run part 1 by it's self for all the current packages python tool-size.py will total them all up for you

If you want to know the exact size of a particular pip package including all its dependencies, i've created a little bash and python combo to achieve this

( based off the excellent package walking code answer above https://stackoverflow.com/a/67914559/3248788 )

Steps :

  1. create a python script to check all currently installed pip packages
  2. create a shell script to create a brand new python environment and install package to test, and run the script from step 1
  3. run shell script
  4. profit :)

Step 1

create a python script called tool-size.py

#!/usr/bin/env python

import os
import pkg_resources

def calc_container(path):
    total_size = 0
    for dirpath, dirnames, filenames in os.walk(path):
        for f in filenames:
            fp = os.path.join(dirpath, f)
            total_size += os.path.getsize(fp)
    return total_size

def calc_installed_sizes():
    dists = [d for d in pkg_resources.working_set]

    total_size = 0
    print (f"Size of Dependencies")
    print("-"*40)
    for dist in dists:
        # ignore pre-installed pip and setuptools
        if dist.project_name in ["pip", "setuptools"]:
            continue
        try:
            path = os.path.join(dist.location, dist.project_name)
            size = calc_container(path)
            total_size += size
            if size/1000 > 1.0:
                print (f"{dist}: {size/1000} KB")
                print("-"*40)
        except OSError:
            '{} no longer exists'.format(dist.project_name)

    print (f"Total Size (including dependencies): {total_size/1000} KB")

if __name__ == "__main__":
    calc_installed_sizes()

Step 2

create a bash script called tool-size.sh

#!/usr/bin/env bash

# uncomment to to debug
# set -x

rm -rf ~/.virtualenvs/tool-size-tester
python -m venv ~/.virtualenvs/tool-size-tester
source ~/.virtualenvs/tool-size-tester/Scripts/activate
pip install -q $1
python tool-size.py
deactivate

Step 3

run script with package you want to get the size of

tool-size.sh xxx

say for truffleHog3

$ ./tool-size.sh truffleHog3

Size of Dependencies
----------------------------------------
truffleHog3 2.0.6: 56.46 KB
----------------------------------------
smmap 4.0.0: 108.808 KB
----------------------------------------
MarkupSafe 2.0.1: 40.911 KB
----------------------------------------
Jinja2 3.0.1: 917.551 KB
----------------------------------------
gitdb 4.0.7: 320.08 KB
----------------------------------------
Total Size (including dependencies): 1443.81 KB

Upvotes: 1

Tirtha
Tirtha

Reputation: 738

There is a simple Pythonic way to find it out though.

Here is the code. Let's call this file pipsize.py.

import os
import pkg_resources

def calc_container(path):
    total_size = 0
    for dirpath, dirnames, filenames in os.walk(path):
        for f in filenames:
            fp = os.path.join(dirpath, f)
            total_size += os.path.getsize(fp)
    return total_size



dists = [d for d in pkg_resources.working_set]

for dist in dists:
    try:
        path = os.path.join(dist.location, dist.project_name)
        size = calc_container(path)
        if size/1000 > 1.0:
            print (f"{dist}: {size/1000} KB")
            print("-"*40)
    except OSError:
        '{} no longer exists'.format(dist.project_name)

When run with python pipsize.py this will print out something like,

pip 21.1.2: 8651.906 KB
----------------------------------------
numpy 1.20.3: 25892.871 KB
----------------------------------------
numexpr 2.7.3: 1627.361 KB
----------------------------------------
zict 2.0.0: 48.54 KB
----------------------------------------
yarl 1.6.3: 1395.888 KB
----------------------------------------
widgetsnbextension 3.5.1: 4609.962 KB
----------------------------------------
webencodings 0.5.1: 54.768 KB
----------------------------------------
wcwidth 0.2.5: 452.214 KB
----------------------------------------
uvicorn 0.14.0: 257.515 KB
----------------------------------------
tzlocal 2.1: 67.11 KB
----------------------------------------
traitlets 5.0.5: 800.71 KB
----------------------------------------
tqdm 4.61.0: 289.412 KB
----------------------------------------
tornado 6.1: 2898.264 KB

Upvotes: 25

yellowsoar
yellowsoar

Reputation: 73

How

 $ du -h -d 1 "$(pip -V | cut -d ' ' -f 4 | sed 's/pip//g')" | grep -vE "dist-info|_distutils_hack|__pycache__" | sort -h

Pros

No need to convert these:
case (Django:django)
hyphen (django-q:django_q)
naming (djangorestframework-gis:rest_framework_gis)

Cons

Dependencies and some unknown directories revealed as well...

Upvotes: 6

Synthesis
Synthesis

Reputation: 544

All of the above solutions do not list packages with dashes in them: PIP converts them to underscores in the folder names:

pip list --format freeze | awk -F = {'print $1'} | xargs pip show | grep -E 'Location:|Name:' | cut -d ' ' -f 2 | paste -d ' ' - - | awk '{gsub("-","_",$1); print $2 "/" tolower($1)}' | xargs du -sh 2> /dev/null | sort -h

And for Mac users:

pip3 list --format freeze | awk -F = {'print $1'} | xargs pip3 show | grep -E 'Location:|Name:' | cut -d ' ' -f 2 | paste -d ' ' - - | awk '{gsub("-","_",$1); print $2 "/" tolower($1)}' | xargs du -sh 2> /dev/null | sort -h

Upvotes: 8

intika
intika

Reputation: 9762

History :

There is no command or applications developed for that purpose at the moment, we need to check that manually

Manual Method I :

du /usr/lib/python3.5/ --max-depth=2 | sort -h
du /usr/lib64/python3.5/ --max-depth=2 | sort -h

This does not include packages/files installed out of that directory, thus said we will get 95% with those 2 simples command

Also if you have other version of python installed, you need to adapt the directory

Manual Method II :

pip list | sed '/Package/d' | sed '/----/d' | sed -r 's/\S+//2' | xargs pip show | grep -E 'Location:|Name:' | cut -d ' ' -f 2 | paste -d ' ' - - | awk '{print $2 "/" $(find $2 -maxdepth 1 -iname $1)}' | xargs du -sh  | sort -h

Search the install directory with the package name with case insensitive

Manual Method II Alternative I :

pip list | sed '/Package/d' | sed '/----/d' | sed -r 's/\S+//2' | xargs pip show | grep -E 'Location:|Name:' | cut -d ' ' -f 2 | paste -d ' ' - -| awk '{print $2 "/" tolower($1)}' | xargs du -sh | sort -h

Search the install directory with the package name with lowered case

Manual Method II Alternative II :

pip list | sed '/Package/d' | sed '/----/d' | sed -r 's/\S+//2' | xargs pip show | grep -E 'Location:|Name:' | cut -d ' ' -f 2 | paste -d ' ' - -| awk '{print $2 "/" $1}' | xargs du -sh | sort -h

Search the install directory with the package name

Note :

For methods using du, output lines starting with du: cannot access need to be checked manually; The command use the install directory and add to it the name of the package but some times the package name and directory name are different...

Make it simple :

  • Use first method then
  • Use second method and just check manually package outside python classic directory

Upvotes: 3

Related Questions