BeeOnRope
BeeOnRope

Reputation: 64955

Ensure that generated yet checked-in files aren't rebuilt after git sync

I'm not sure whether this is a git question or a make question, or maybe both, but here goes...

Imagine you have a file foo.cpp which is generated by a generator program, gen.py. If the generator program is modified, it must be run again to generate a fresh version of foo.cpp.

A makefile rule that encodes this behavior could be as follows1:

foo.cpp: gen.py
    gen.py

For active development on gen.py, this is ideal I think.

However, consider the case where you make your project available remotely via git, say on github or some similar site. Users will download and make your project.

The traditional way of handing generated files is not to check them in all: don't include foo.cpp in git all (add it to your `.gitignore), and when the user builds your project, the file will be generated.

However, an alternately approach is to include the the generated file foo.cpp in git. This raises the possibility that gen.py and foo.cpp might be "out of sync"2 if one is changed without updating the other but has a few other advantages:

I'm not interested in litigating the merits of checking in generated files versus always generating them locally. You may assume that for some project the decision has been taking to check in generated files, and this decision is not up for debate.

In that context then, my question is:

For a user who freshly clones a project, or who syncs a commit involving both files, how can I ensure that make doesn't try to generate foo.cpp? By default git will use current time when it syncs files, so the foo.cpp and gen.py will have the same timestamps, and foo.cpp will be rebuilt.

I can't ask users to change their git configuration.


1 Perhaps there would be additional dependencies for foo.cpp which would also appear as a prerequisite, but this is the "base case".

2 A reasonable approach is to enforce that they are in sync via a git hook.

Upvotes: 2

Views: 108

Answers (2)

Renaud Pacalet
Renaud Pacalet

Reputation: 29212

A post-merge hook could possibly solve your issue:

  1. Add a hooks/post-merge script to your repository containing, e.g. (with bash version > 4):

    $ cat hooks/post-merge
    #!/usr/bin/env bash
    printf 'running post-merge\n'
    declare -A generated=()
    generated["foo.cpp"]="gen.py"
    # add other (generated, generator) pairs to the generated associative array
    for g in "${!generated[@]}"; do
        s="${generated[$g]}"
        if ! git diff ORIG_HEAD HEAD --exit-code -- "$g" "$s" &> /dev/null; then
            printf 'touch %s\n' "$g"
            touch "$g"
        fi
    done
    $ chmod +x hooks/post-merge
    
  2. Add a runme.sh script and ask your users to run it manually after they cloned the repository (they will have to really trust you but security is out of scope this answer):

    $ cat runme.sh
    #!/usr/bin/env bash
    for s in hooks/*; do
        ln -sf ../../"$s" .git/hooks
    done
    

For every (generated, generator) pairs if one of the two gets modified by a git pull, the post-merge hook will touch generated such that its last modification timestamp is newer than the generator.

Of course, this is only a starting point, there are probably several other things to consider (rebase, ...).

Upvotes: 0

MadScientist
MadScientist

Reputation: 100856

You can compare the output and only update the target if it's different. Say, for example, your gen.py will write it's output to stdout so normally you'd have a rule like this:

foo.cpp: gen.py
        ./gen.py > $@

You can change this to:

foo.cpp: gen.py
        ./gen.py > [email protected]
        cmp -s $@ [email protected] && rm -f [email protected] || mv -f [email protected] $@

The downside here is that until you DO update foo.cpp the gen.py recipe will always be run. However no targets that depend on foo.cpp will be built, because its timestamp hasn't changed.

If that's expensive and you want to get around that you'll have to do something much more complicated: if the comparison is true (no difference) then you need to reset the timestamp on gen.py so that it's the same as the target; that ensures the target will no longer be considered out of date going forward. Something like this:

foo.cpp: gen.py
        ./gen.py > [email protected]
        if test -e $@ && cmp -s $@ [email protected]; then \
            rm -f [email protected]; \
            touch -r $@ gen.py; \
        else \
            mv -f [email protected] $@; \
        fi

If you can't easily change the output filename for gen.py then you'll have to do something even more disgusting, like rename $@ to [email protected] before you run ./gen.py then change it back or use the new one as appropriate.

ETA If you just want to avoid running gen.py if it's the same version this is more a Git question than a makefile question, because what you're really asking is how to know whether these files are in the same Git commit. That's easy, but as my comment below says I don't think this is actually right:

foo.cpp: gen.py
        tver=$$(git log -n1 --oneline $@ 2>/dev/null); \
        pver=$$(git log -n1 --oneline $< 2>/dev/null); \
        if test -z "$$tver" || test -z "$$pver" || test "$$tver" != "$$pver"; then ./gen.py; fi

Upvotes: 2

Related Questions