Reputation: 2714
I have a python project set up where my scripts are stored in one folder, my packages and data in other folders, and I am trying to organize the best structure and procedures for making referencing between these items more robust:
project_dir/
data/
raw/
source_1.csv
source_2.csv
processed/
tidydata.csv
results.csv
src/
scripts/
clean_raw_data.py
calc_results.py
packages/
import_tools
tool_a.py
tool_b.py
calc_tools
Makefile
My desire is to be able to robustly reference my packages through imports (./src/packages
) and my data (./data
)
with file read and write operations from any of my scripts in the ./src/scripts
folder.
My current setup involves doing things like this:
To import packages (this seems like bad practice to call functions in order to import other functions):
# clean_raw_data.py
import sys
from pathlib import Path
sys.path.append(str(Path(__file__).parent.parent))
import packages.import_tools as imptool
To read and write files:
import pandas as pd
df = pd.read_csv('../../data/raw/source_1.csv')
# operations
df.to_csv('../../data/processed/tidydata.csv')
Ideally I would prefer that everything were available referenced from the project folder project_dir
in any file or script in my structure, such that I could do things like:
import src.packages.import_tools as imptool
df = pd.read_csv(f'{ROOT_DIR}/data/raw/source_1.csv')
In some way or another. I presume there is a best practice guideline for configuring things to behave in a similar way but haven't seen any good recommendations. What would be the best approach for handling this?
Upvotes: 1
Views: 552
Reputation: 12375
In Python the mechanisms to reference data files and source code are completely different. While you always have to specify the full path to your data file when you want to open it, Python will use the sys.path to autonomously search for modules that you want to import. However, "hacking" the sys.path manually in all your script files is bad practice. Instead, use pip to install your project in editable mode:
pip install --editable path/to/project_dir
but make sure there is a minimal setup.py in project_dir with the following contents
from setuptools import setup
setup(name='myproject')
pip will put the symlink myproject.egg-info into your site-packages folder that you can verify that via
pip show myproject
This allows you to import your packages using what is called absolute imports by always starting from within your project_dir
from src.packages.import_tools import tool_a
(Note that your import packages.import_tools as imptool
didn't work anyway since import_tools is a package and not a module.)
The next things you could add to your project_dir are a README.MD, a requirements.txt and a test folder for your unit tests. And keep in mind that the distinction between scripts and packages is somewhat artificial, because all Python files are basically modules that can be imported.
Upvotes: 3