Reputation: 119
I have a python codebase with the following folder structure
project/
│
├── src/
│ ├
│ ├──utils/
│ │
│ ├──data/
│ │
│ ├──module1/
│ │
│ └──module2/
│
├──data/
│ │
│ ├──raw/
│ │
│ ├──processed/
I have an app.py
file in module1 that imports utils/utils.py
. app.py
calls a function in utils.py
and passes a path - say 'data/raw/test.pdf'
. and the function in utils.py fetches the file and returns something back. What is the best way to include the path in both locations, assuming that the invocation is done from a different directory?
Using a simple relative path like
path = '../../data/raw/test.pdf'
file = pathlib.Path(__file__).parent / path
isn't super helpful since I use a path relative to module1/app.py
, but when this is passed to utils/utils.py
, the relative reference might end up being different.
What is the best practice for referencing paths that may be passed from a function in one directory to another function in a different project (different directory)? Do we have to resort to using absolute paths in such cases?
Upvotes: 2
Views: 5035
Reputation: 148900
The problem with relative paths is what they are relative to. You can always extract the source path of a module... except that it can easily break test using mocks and it can easily be broken by using subclasses: the module where the subclass is defined may not be the one where the parent class is.
For a large framework the common way is to use an environment variable to store the installation root path because later accessing the environment is both easy and cheap. But it means a non standard (in the sense not a simple pip install
) installation procedure...
For smaller packages, the __init__.py
file can be used. It is only executed when the package is loaded, and can easily store its path (or better its folder path, or a relative path containing data files) in a global variable. That path can later be accessed as module_name.data_path_name
. Here again it is easy and cheap because it is only computed once.
Upvotes: 2
Reputation: 10709
My suggestion is to configure a settings file that will define BASE_DIR
for your whole project which is an absolute path that will be used as reference. Here it would be /path/to/project
. Django for example uses this style.
Then, all references to files should be based on that path. So for your case, it would be src/data/raw/test.pdf
which should be treated as {BASE_DIR}/src/data/raw/test.pdf
which is /path/to/project/src/data/raw/test.pdf
.
src/data/raw/test.pdf
because it points to the same file regardless of where we are in the project whether we are on an outer folder or in a deeply nested file. No more headaches of thinking how many ../
is needed. No more need to change any relative paths should the structure of files change due to a refactoring.Upvotes: 1