Reputation: 2183
I'm using make
, I believe GNU Make
on WSL. I'm using it for data science, building on cookiecutter-datascience
.
In my mind before I started using make
, the point of it was to keep track of which parts of a pipeline have changed, and only rerun the stages in the pipeline after it.
Here's a snippet from the makefile:
## Install Python Dependencies
requirements: test_environment
pip install -U pip setuptools wheel
pip install -r requirements.txt
## Make Dataset
data: requirements
$(PYTHON_INTERPRETER) src/data/make_dataset.py
When I run make data
, it doesn't just rerun the data part of the pipeline and the succeeding stages, but it also reruns make requirements
and make test_environment
. But this is the opposite of what I want. Those stages come before, not after. If I have an expensive pipeline, I obviously don't want to rerun it over and over again.
In my case, I want it so that: If one of the raw (un-preprocessed) data files changes, I want it to rerun the data preprocessing. This should not include things like tracking whether the libraries have changed, because those steps logically preceed the data preprocessing.
Upvotes: 0
Views: 179
Reputation: 29240
You can try this:
## Install Python Dependencies
requirements.done: test_environment.done
pip install -U pip setuptools wheel
pip install -r requirements.txt
touch requirements.done
## Make Dataset
data: requirements.done
$(PYTHON_INTERPRETER) src/data/make_dataset.py
Make compares date of last modification of files. Your requirements
, test_environment
... are not files, they are what is called "phony" targets. As they don't exist, make tries to build them as soon as they are needed. If you want make to discover that something is up to date and does not need to be rebuilt, you must use files. The proposed solution uses empty, dummy, files (the *.done
files), instead of your phony targets. These files are used only to store the date of actions in their last modification times.
Of course, you can use files named requirements
, test_environment
... if you prefer. The .done
extension is just a way to identify these files as dummy markers.
Upvotes: 1