Sienna
Sienna

Reputation: 77

Git pre-commit hook:How can I get added/modified files when commit with -a flag

When I use git commit -a to commit my work, "git diff --diff-filter=ACM --name-only --cached" in pre-commit hook can not get the files will be added by git. So what's the correct solution for this situation.

Upvotes: 3

Views: 3771

Answers (1)

torek
torek

Reputation: 488193

The problem here is the git commit -a itself. Your best bet is not to use the -a option. Add files separately, then run git commit. If you want to fix up the hook, read on.

How commit really works

People who write Git-hooks (well, at least some of these hooks) need to be aware of the fact that Git builds commits from the index. But this claim—that Git builds commits from the index—is a bit of a lie, or at least, is incomplete. Git builds commits from an index.

If you run git commit without using any of three particular options, there's only one index. That index is the index, so anyone who assumes that Git is working from the index gets the behavior they expect, and a totally naive pre-commit hook behaves well: it's sometimes not even necessary to be aware of the fact that Git is going to commit what's in the index, rather than what's in the user's working tree. But there are three commit options that change this behavior:

  • git commit -a: this acts a lot like git add -u && git commit, except that in order to make sure that the git add -u has no effect if the commit fails (is rejected by the pre-commit hook, or is aborted by the user), Git has to create one temporary index.

  • git commit --include paths: this is similar to git commit -a except that the added files are not those found by git add -u but rather the specified paths.

  • git commit --only paths: this one is the worst case of all. Note that git commit paths, with neither --include nor --only, has the same effect as using --only. For this particular case Git must create not just one but two temporary index files.

All of this follows from the basic idea of the index. Git's index holds, at all times,1 your proposed next commit. That is, what's in the index is the set of files that should be included in the next commit. When you run git commit with no options, you're asking Git to commit the proposed next commit. So the index has the right stuff in it.

But when you run git commit -a, git commit --include (-i for short), or git commit --only (-o for short), you are saying: take the current proposed next commit, make some changes to it, and try committing that. If this action succeeds, the new index—the one with the extra changes added—should be the updated index. But if this action fails, Git would like to put the index back the way it was, with no changes made.

To achieve this, Git keeps the original index file intact and creates one or two new index files.2 If you're using git commit -a or git commit -i, we need one extra index: Git copies the main index to the temporary index, then uses git add, or the internal equivalent, to update the temporary index. This temporary index is named index.lock and this file is used to prevent an additional git commit command from running while this git commit command runs, so even a plain git commit with no options creates an index.lock file: it's just that with the plain commit, the index.lock file contents will match the index file contents.

So, for git commit or git commit -a or git commit -i, it's possible to just use the index.lock file as "the" index, and get the right stuff out of it. Of course, if you're going to do this in a pre-commit hook you write, you need to figure out whether Git is using the standard index in the first place: as seen in footnote 2, an added worktree uses a different standard index, so it has a different standard index.lock.


1This isn't quite right, because sometimes the index gets expanded to hold entries that are not "stage zero". This is the case during conflicted merges. ("Merge" here includes cherry-pick and revert as well: anything that invokes Git's internal merge engine.) However, even during this expanded operation, the index does still hold the proposed next commit. It's just that there are parts of the index that can't be committed—that require resolving—and these get in the way of doing the commit. Resolving the conflicted entries removes the nonzero stage entries, either replacing them with a single resolved stage-zero entry, or just removing them if the file shouldn't be committed after all.

The index contents are visible with git ls-files --stage: there's a path name complete with forward slashes such as src/somefile.ext, a mode—one of 100644 or 100755 for ordinary files; two other modes are reserved for symbolic links and gitlinks—and a hash ID. There is also a stage number, which must be zero for the index to be commit-able. Any stage 1, 2, or 3 entries indicate the existence of merge conflicts, with the conflicted files being available by reading that slot: see the git checkout-index command.

2The index is, or mostly-is, just an ordinary file: the usual file is .git/index. In secondary work-trees resulting from git worktree add, the usual file is in a subdirectory within the .git top level directory. You can, however, override this index with your own temporary index, using the environment variable GIT_INDEX_FILE. Various Git shell scripts use this technique. For instance, when git stash was a shell script, it did this. And of course, git commit uses this same concept to create these other extra temporary index files.


git commit --only is the hardest case

For git commit --only, two index files is not sufficient. We will need three such files. Here's why. The function of git commit --only is to:

  1. Read the current commit (HEAD) into a temporary index.
  2. Update that temporary index to add the specified files.
  3. Attempt to turn this index into a new commit.

Step 3 has a success case and a failure case. The failure case is simpler, so let's look at it first:

  • On failure, Git should go back to the currently-proposed-next-commit as the proposed-next-commit. So that means we need to keep the existing index around.

  • On success, however, Git should come up with a new proposed-next-commit. This new proposed next commit should consist of the current proposed-next-commit (i.e., the current index) updated as if by git add of the named files.

In order to prepare for success in step 3, then, steps 1 and 2 should read instead this way:

  1. Prepare two temporary index files: set up Index A by copying HEAD into an index file, and set up Index B by copying the existing index file.
  2. Update Index A by git add-ing the named files, and update Index B by git add-ing the named files.

Step 3 is now simplified:

  1. Use Index A to make a commit. If this succeeds, replace the standard index with Index B. If it fails, remove both temporary index files.

How Git uses lock files

Git has a bunch of code paths that would like to make some sort of atomic change to a single file.3 That's where this index.lock stuff comes from. On a POSIX system, there's no particularly good way to lock a file for a particular transaction, but we can approximate it in various ways.

One way is pretty simple: use atomic file creation (O_CREAT|O_EXCL in open system calls) to make sure that only one process can create a file whose name ends with .lock. For instance, if we want to lock the file named index, we create, atomically, a file named index.lock. If the creation succeeds, we now have the lock, and can copy the existing index file to the new index.lock file, make any necessary changes to the file, and write them out.

We can now:

  • Atomically update the index file, releasing the lock file, with a rename system call: rename("index.lock", "index") will either completely replace the old index file with the current index.lock file and succeed, removing the index.lock file in the process, or else will fail and will leave both index and index.lock undisturbed. (On failure, we'll go on to abort the transaction; see below.)

  • Or, we can release the lock on the file, aborting our transaction, on purpose, by simply removing the lock file (unlink("index.lock")). The existing index file remains undisturbed.

Note how this technique seamlessly accomplishes both git commit and git commit -a / git commit -i. The key difference between these two operations is controlled entirely by what contents we put in index.lock. For a plain git commit, both index and index.lock contain the same content. For git commit -a or git commit -i, index contains the old content, and index.lock contains the new updated content.

We can create the lock file, update it if appropriate, attempt the commit, and then either finish the transaction by renaming, or roll back the transaction by unlinking the lock file. This is all very straightforward and easy.4

The hard case is git commit -o: the --only option requires two temporary index files. We leave index alone, create index.lock with one set of content—index B since it's what we want to have in place by the rename operation—and create our third index, index A, for the duration of the committing process. We the read HEAD into index A, update index files A and B both, attempt the commit using index A, remove index A, and then either finish the transaction with index B, or roll it back as before. This is less straightforward, but it's still obvious that it works.


3I linked here to the Wikipedia page on atomicity in databases, as that's the concept Git is attempting to achieve here: an atomic transaction. Real database software might benefit Git here; the stuff it does is kind of crude. However, real database software is (a) hard and (b) slow. Git attempts a sort of have-your-cake-and-eat-it-too here. It mostly succeeds: there are real tradeoffs here and Git manages most of them pretty well. They're breaking down now in various cases, though, and work here is ongoing.

4"Easy" here means only multiple dozens of lines of C code. If Git were written in a higher level language, it really would be relatively easy, though.


Writing a pre-commit hook that handles all these cases

Here, you're just plain in trouble. With the git commit --only case, what's going to be committed is in Index A. But the two files whose paths you can know are Original Index ($GIT_INDEX_FILE, if that's set, or .git/index or the appropriate work-tree index), and Index B (same file as before plus the .lock suffix).

You can determine whether there are at least two different index files. If that's the case, we're doing git commit -a, git commit -i, or git commit -o. That will tell you that you can't handle this reliably, and you can have your pre-commit hook abort and tell the user not to do that.

Since none of this is documented, there's no official way to do this, but some existing pre-commit hooks use this technique:

if [ $GIT_INDEX_FILE != ".git/index" ]; then
    echo "Error: non-default index file is being used (GIT_INDEX_FILE is set)." >&2
    ...
    exit 1
fi

This has the annoying side effect of rejecting commits from added work-trees, though. To fix it, if your Git is new enough to have git rev-parse --git-path, replace any hardcoded .git/index strings with:

git rev-parse --git-path index

As you've observed, some versions of Git don't create index.lock when it's not necessary. This is the problem with relying on undocumented behavior: it may work in the version of Git you have installed right now, and then break when you upgrade to a newer version of Git.

Upvotes: 6

Related Questions