Wolpertinger
Wolpertinger

Reputation: 1301

Does the src/ folder in PyPI packaging have a special meaning or is it only a convention?

I'm learning how to package Python projects for PyPI according to the tutorial (https://packaging.python.org/en/latest/tutorials/packaging-projects/). For the example project, they use the folder structure:

packaging_tutorial/
├── LICENSE
├── pyproject.toml
├── README.md
├── src/
│   └── example_package_YOUR_USERNAME_HERE/
│       ├── __init__.py
│       └── example.py
└── tests/

I am just wondering why the src/ folder is needed? Does it serve a particular purpose? Could one instead include the package directly in the top folder? E.g. would

packaging_tutorial/
├── LICENSE
├── pyproject.toml
├── README.md
├── example_package_YOUR_USERNAME_HERE/
│   ├── __init__.py
│   └── example.py
└── tests/

have any disadvantages or cause complications?

Upvotes: 20

Views: 4525

Answers (1)

a_guest
a_guest

Reputation: 36289

There is an interesting blog post about this topic; basically, using src prevents that when running tests from within the project directory, the package source folder gets imported instead of the installed package (and tests should always run against installed packages, so that the situation is the same as for a user).

Consider the following example project where the name of the package under development is mypkg. It contains an __init__.py file and another DATA.txt non-code resource:

.
├── mypkg
│   ├── DATA.txt
│   └── __init__.py
├── pyproject.toml
├── setup.cfg
└── test
    └── test_data.py

Here, mypkg/__init__.py accesses the DATA.txt resource and loads its content:

from importlib.resources import read_text
  
data = read_text('mypkg', 'DATA.txt').strip()  # The content is 'foo'.

The script test/test_data.py checks that mypkg.data actually contains 'foo':

import mypkg
  
def test():
    assert mypkg.data == 'foo'

Now, running coverage run -m pytest from within the base directory gives the impression that everything is alright with the project:

$ coverage run -m pytest
[...]
test/test_data.py .                                             [100%]

========================== 1 passed in 0.01s ==========================

However, there's a subtle issue. Running coverage run -m pytest invokes pytest via python -m pytest, i.e. using the -m switch. This has a "side effect", as mentioned in the docs:

[...] As with the -c option, the current directory will be added to the start of sys.path. [...]

This means that when importing mypkg in test/test_data.py, it didn't import the installed version but it imported the package from the source tree in mypkg instead.

Now, let's further assume that we forgot to include the DATA.txt resource in our project specification (after all, there is no MANIFEST.in). So this file is actually not included in the installed version of mypkg (installation e.g. via python -m pip install .). This is revealed by running pytest directly:

$ pytest
[...]
======================= short test summary info =======================
ERROR test/test_data.py - FileNotFoundError: [Errno 2] No such file ...
!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!
========================== 1 error in 0.13s ===========================

Hence, when using coverage the test passed despite the installation of mypkg being broken. The test didn't capture this as it was run against the source tree rather than the installed version. If we had used a src directory to contain the mypkg package, then adding the current working directory via -m would have caused no problems, as there is no package mypkg in the current working directory anymore.

But in the end, using src is not a requirement but more of a convention/best practice. For example requests doesn't use src and they still manage to be a popular and successful project.

Upvotes: 18

Related Questions