Steve Nathan
Steve Nathan

Reputation: 119

Relative path in python code - best practice

I have a python codebase with the following folder structure

project/
│ 
├── src/
│   ├
│   ├──utils/
│   │   
│   ├──data/   
│   │       
│   ├──module1/ 
│   │ 
│   └──module2/
│ 
├──data/
│   │   
│   ├──raw/   
│   │       
│   ├──processed/ 

I have an app.py file in module1 that imports utils/utils.py. app.py calls a function in utils.py and passes a path - say 'data/raw/test.pdf'. and the function in utils.py fetches the file and returns something back. What is the best way to include the path in both locations, assuming that the invocation is done from a different directory?

Using a simple relative path like

path = '../../data/raw/test.pdf'
file = pathlib.Path(__file__).parent / path

isn't super helpful since I use a path relative to module1/app.py, but when this is passed to utils/utils.py, the relative reference might end up being different.

What is the best practice for referencing paths that may be passed from a function in one directory to another function in a different project (different directory)? Do we have to resort to using absolute paths in such cases?

Upvotes: 2

Views: 5035

Answers (2)

Serge Ballesta
Serge Ballesta

Reputation: 148900

The problem with relative paths is what they are relative to. You can always extract the source path of a module... except that it can easily break test using mocks and it can easily be broken by using subclasses: the module where the subclass is defined may not be the one where the parent class is.

For a large framework the common way is to use an environment variable to store the installation root path because later accessing the environment is both easy and cheap. But it means a non standard (in the sense not a simple pip install) installation procedure...

For smaller packages, the __init__.py file can be used. It is only executed when the package is loaded, and can easily store its path (or better its folder path, or a relative path containing data files) in a global variable. That path can later be accessed as module_name.data_path_name. Here again it is easy and cheap because it is only computed once.

Upvotes: 2

Niel Godfrey P. Ponciano
Niel Godfrey P. Ponciano

Reputation: 10709

My suggestion is to configure a settings file that will define BASE_DIR for your whole project which is an absolute path that will be used as reference. Here it would be /path/to/project. Django for example uses this style.

Then, all references to files should be based on that path. So for your case, it would be src/data/raw/test.pdf which should be treated as {BASE_DIR}/src/data/raw/test.pdf which is /path/to/project/src/data/raw/test.pdf.

  • This way, we can use that same path src/data/raw/test.pdf because it points to the same file regardless of where we are in the project whether we are on an outer folder or in a deeply nested file. No more headaches of thinking how many ../ is needed. No more need to change any relative paths should the structure of files change due to a refactoring.

Upvotes: 1

Related Questions