Marcin
Marcin

Reputation: 332

How to check if all packages listed in requirements.txt file are used in Python project

I have a requirements file which contains all installed packages. After big refactoring process of the project some of the listed packages are not needed anymore. The problem is I'm not sure which. Is there a way to determine which packages listed in the requirements.txt file are actually used in the code?

Upvotes: 9

Views: 9088

Answers (2)

blackbrandt
blackbrandt

Reputation: 2095

Alternative answer that uses a Python library: pipreqs. See Automatically create requirements.txt.

Running pipreqs with the default arguments will generate a requirements.txt for you.

$ pipreqs /home/project/location
Successfully saved requirements file in /home/project/location/requirements.txt

However, it looks like you're trying to clean up an old requirements.txt file. In this case, pipreqs also comes with a --diff flag, and a --clean flag. From the docs:

--diff <file>         Compare modules in requirements.txt to project imports.
--clean <file>        Clean up requirements.txt by removing modules that are not imported in project.

You can use --diff to determine which libraries need to be removed, and --clean to do it automagically.

Upvotes: 5

Gino Mempin
Gino Mempin

Reputation: 29678

If you are using a virtual environment and have a decent enough test suite that you can repeatedly and automatically run (it's best if you can run it on your local workspace via a script or a simple command), then the brute-force type of approach is to:

  1. Setup a fresh copy of your project (ex. git clone onto a separate folder)
  2. Setup an empty/blank virtual environment
  3. Run your tests
  4. Install missing packages every time you encounter a "ModuleNotFoundError" type of error
  5. Repeat until all tests passes
  6. Export the packages you now have to a separate requirements.txt file (pip freeze > requirements.new.txt or any of the other ways from Automatically create requirements.txt)

For example, we have this (very) minimal example code:

# --- APP CODE

from numpy import array
from openpyxl import Workbook
from pydantic import BaseModel, ValidationError

class MyModel(BaseModel):
    x: int

# --- TEST CODE

import pytest

def test_my_model():
    with pytest.raises(ValidationError):
        MyModel(x="not an int")

After cloning this and setting up a brand new fresh virtual environment (but without yet installing any of the packages), the 1st attempt to run the tests yields:

(my_venv) $ pytest --maxfail=1 main.py
bash: /Users/me/.venvs/my-proj-7_w3b8eb/bin/pytest: No such file or directory

So then, you install pytest.

Then, you get:

(my_venv) $ pytest --maxfail=1 main.py
...
main.py:1: in <module>
    from numpy import array
E   ModuleNotFoundError: No module named 'numpy'

So then you install numpy.

Then, you get:

(my_venv) $ pytest --maxfail=1 main.py
...
main.py:2: in <module>
    from openpyxl import Workbook
E   ModuleNotFoundError: No module named 'openpyxl'

So then you install openpyxl.

...and so on. Until all the tests passes. Of course, even when your automated tests pass, it's good to also do some manual basic tests to make sure everything is indeed working as before. Finally, generate a new copy of your requirements.txt file, and compare that with the old one, to check for any differences.

Of course, as I mentioned at the start, this assumes you have a decent enough test suite that tests a large % of your code and use cases. (It's also one of the good reasons why you should be writing tests in the first place.).

Upvotes: 1

Related Questions