Reputation: 2923
I'm trying to set up an ETL system using GNU Make 3.81. The idea is to transform and load only what is necessary after a change to my source data.
My project's directory layout looks like this:
${SCRIPTS}/ <- transform & load scripts
${DATA}/incoming/ <- storage for extracted data
${DATA}/processed/ <- transformed, soon-to-be-loaded data
My ${TRANSFORM_SCRIPTS}/Makefile is filled with statements like this:
A_step_1: ${SCRIPTS}/A/do_step_1.sh ${DATA}/incoming/A_files/*
${SCRIPTS}/A/do_step_1.sh ${DATA}/incoming/A_files/* > ${DATA}/processed/A.step_1
A_step_2: ${SCRIPTS}/A/do_step_2.sh ${DATA}/processed/A.step_1
${SCRIPTS}/A/do_step_2.sh ${DATA}/processed/A.step_1 > ${DATA}/processed/A.step_2
B_step_1: ${SCRIPTS}/B/do_step_1.sh ${DATA}/incoming/B_files/*
${SCRIPTS}/B/do_step_1.sh ${DATA}/incoming/B_files/* > ${DATA}/processed/B.step_1
B_step_2: ${SCRIPTS}/B/do_step_2.sh ${DATA}/processed/B.step_1
${SCRIPTS}/B/do_step_2.sh ${DATA}/processed/B.step_1 > ${DATA}/processed/B.step_2
joined: A_step_2 B_step_2
join ${DATA}/processed/A.step_2 ${DATA}/processed/B.step_2 > ${DATA}/processed/joined
Calling `make joined' successfully produces the "joined" file I need, but it rebuilds every file every time, despite there being no changes to the dependency files.
I tried using the output file names as targets, but GNU Make doesn't seem to know how to cope:
${DATA}/processed/B.step_2: ${SCRIPTS}/B/do_step_2.sh ${DATA}/processed/B.step_1
${SCRIPTS}/B/do_step_2.sh ${DATA}/processed/B.step_1 > ${DATA}/processed/B.step_2
Any suggestions other than dropping the output of each process in the current working directory? Make seems like a reasonable tool to perform this work because, in reality, there tens of data sources and close to 100 steps altogether, and managing dependencies myself via script files is becoming too difficult.
Upvotes: 2
Views: 284
Reputation: 22402
You can do one of two things:
Either fix the target and its dependencies with something like:
JOINED=${DATA}/processed/joined
$(JOINED): ${DATA}/processed/A.step_2 ${DATA}/processed/B.step_2
or in the steps you can end each step with a
touch $@
for example:
A_step_2: ${SCRIPTS}/A/do_step_2.sh ${DATA}/processed/A.step_1
${SCRIPTS}/A/do_step_2.sh ${DATA}/processed/A.step_1 > ${DATA}/processed/A.step_2 && touch $@ || $(RM) $@
including the joined step.
but this is ugly.
Upvotes: 2