Reputation: 5409
I'm a scientist analyzing brain data collected from multiple subjects. During the analysis, the data is processed through multiple steps, a bit like a cooking recipe. At the end of the line, there is a step that collects the processed data for all the individual subjects and creates summary statistics and so forth.
As a single step can take up to an hour to complete, I would like to have an automated way to run all the steps for all subjects and compute the summary statistics, without repeating steps that have already completed in the past.
Make seems like a good utility to use, but I need some help with the structure of the Makefile
. Here is a simplified example:
# Keep intermediate files!
.SECONDARY:
# In this simplified example, there are 3 subjects, in reality there are more
SUBJECTS = subject_a subject_b subject_c
# In this simplified example there are 3 data processing steps, each one taking
# one file as input and emitting one file as output. In reality, there are more
# steps and each step takes multiple input files and emits multiple output
# files.
step1_%.dat : step1.py input_%.dat
touch step1_$*.dat
step2_%.dat : step2.py step1_%.dat
touch step2_$*.dat
# Let's say this step produces many output files
STEP3_PROD = step3_%_1.dat step3_%_2.dat step3_%_3.dat
$(STEP3_PROD) : step3.py step2_%.dat
touch $(STEP3_PROD)
# Meta rule to perform the complete analysis for a single subject
.PHONY : $(SUBJECTS)
subject_% : step1_%.dat step2_%.dat $(STEP3_PROD)
@echo 'Analysis complete for subject $*.'
# The summary depends on the analysis of all subjects being complete.
summary.dat : summary.py $(SUBJECTS)
touch summary.dat
@echo 'All analysis done!'
all : summary.dat
The problem with the above Makefile
is that the summary step python summary.py
is always performed, even when nothing has changed. This is probably due to the fact that it depends on the phony subject_%
rule, which is always build.
Is there a way to structure this script, so that the summary step will not be performed unnecessarily? Perhaps there is some way to expand $(STEP3_PROD)
for all subjects?
Upvotes: 2
Views: 1040
Reputation: 18409
Don't overcompilcate things or they will backfire. Try something like:
.SECONDARY:
all: summary.dat
SUBJECTS:=a b c
SUBJECT_RULES:=$(addprefix subject_, $(SUBJECTS))
.PHONY: $(SUBJECT_RULES)
subject_a: step3_a_1.dat
subject_b: step3_b_1.dat
subject_c: step3_c_1.dat
step1_%.dat: input_%.dat
touch $@
step2_%.dat: step1_%.dat
touch $@
step3_%_1.dat: step2_%.dat
touch $@
STEP3_PRE:=$(addprefix step3_, $(SUBJECTS))
STEP3_1_OUT:=$(addsuffix _1.dat, $(STEP3_PRE))
STEP3_ALL_OUT:=$(STEP3_1_OUT) \
$(addsuffix _2.dat, $(STEP3_PRE)) \
$(addsuffix _3.dat, $(STEP3_PRE))
summary.dat: $(STEP3_1_OUT)
@echo "summary: $(STEP3_ALL_OUT)"
touch $@
I see no need for tracking step3_%_2.dat
and so on since they're rebuilt with step3_%_1.dat
anyway.
Upvotes: 1