peterpi
peterpi

Reputation: 575

Make: How to process many input files in one invocation of a tool?

I have a data conversion process that is driven by GNU make. It takes human-generated input files and creates output files using a conversion progam. Obviously this is about as simple as a makefile can get:

inputs=$(wildcard *.input)
outputs=$(subst .input,.output, $(inputs))

.PHONY: all
all: $(outputs)    

%.output: %.input
    converter $< -o $@ 

It gets even easier; converter knows the location of the output file from the input file, so we don't need $@:

%.output: %.input
    converter $<

So far, so good. The problem is that converter takes a long time to start up compared with the amount of time to actually process one file. If there are many files to process, there is a lot of wasted time. I'd like to be able to execute converter once, passing in all members of $(inputs) that require conversion.

My current technique is to use eval to populate a list of all the input files that require conversion, and process that list in a later rule:

.PHONY: all
all: single_convert

.PHONY: single_convert
single_convert: $(outputs)
    converter $(newer_inputs)

%.output: %.input
    $(eval newer_inputs+=$<)

This feels like I'm fighting against make though, whereas makefiles normally feel very natural and helpful to me.

My question is; is there a better way? Are there edge cases I've not considered? Is what I'm doing dangerous?

Upvotes: 0

Views: 262

Answers (2)

bobbogo
bobbogo

Reputation: 15483

You want to funnel all .output creation through a single run of convert. One way of achieving this is for each .output file to rely on a simply created .intermediate (say) file.

.DELETE_ON_ERROR: # You always want this

inputs := $(wildcard *.input)
intermediates := ${inputs:.input=.intermediate}
outputs := ${inputs:.input=.output}

${outputs}: single_convert

single_convert: ${intermediates}
    convert ${?:.intermediate=.input}
    touch $@

${intermediates}: %.intermediate: %.input
    touch $@

${outputs}: single_convert

(See Empty Target Files to Record Events in the manual.)

This works nicely. From scratch:

$ touch 1.input 2.input 3.input
$ make
touch 2.intermediate
touch 1.intermediate
touch 3.intermediate
convert 2.input 1.input 3.input
touch single_convert

Incrementally. For instance, when just 1 and 2 are out of date:

$ touch 1.input 2.input
$ make
touch 2.intermediate
touch 1.intermediate
convert 2.input 1.input
touch single_convert

This is a bit of a hack though. You are lying to make, never a good idea (as in: you haven't told make how to build file.output say). Also, this formulation precludes the possibility of parallel operation using the -jn flag, which is the whole point of make IMHO.

This makefile is much simpler:

${outputs}: %.output: %.input
    convert $<

.PHONY: all
all: ${outputs}

with great performance if you have 8 CPUs say:

$ make -j 9 all

Upvotes: 1

Norman Gray
Norman Gray

Reputation: 12514

If I were in this situation, what I'd do is something like the following:

all: convert.stamp

convert.stamp: $(inputs)
  rm convert.stamp
  x= ; \
    for f in $(inputs:.input=); do \
      test $$f.output -nt $$f.input || x="$$x $$f.input"; \
      done; \
    converter $$x
  touch convert.stamp

That's quite similar to what you've done, except that the step of obtaining the list of out-of-date files is done in a shell action rather than by using fancy Make features.

There are a couple of reasons for that:

  1. The += syntax you've used is, I think, specific to GNU Make. From long habit, I try to keep makefiles as portable as possible, and avoid make extensions as much as possible (I'm not religious about it, though: the %.ext syntax isn't in POSIX Make, but is nonetheless common and useful enough that it's perverse not to use it). You've implied that you're happy to have a solution specific to GNU Make, so this reason won't have much traction with you.

  2. Separately, it seems clearer to me to have the logic spelled out in this way: it means that each rule is more self-contained, whereas the += device you've used means that the two rules interact in a non-obvious way.

These two points imply that I have a fairly high tolerance of complicated actions, and a low tolerance of complicated Make syntax – your tastes may vary.

Key point: You can't get away from some sort of complication of this type. Make is fundamentally about updating files one at a time. In (reasonably) wanting to collapse updates in this way, you are unavoidably subverting Make's model of working: we can't expect that to be natural. That is, I don't think you or I are missing a trick.

Upvotes: 0

Related Questions