Reputation: 21
I've been using 'git rev-parse HEAD:' to calculate hash of a folder in a worktree. this is basically the same behavior as 'git ls-tree :'.
this is calculating the hash not of the current worktree, but of a specific commit (HEAD in my case), so changes to the worktree (modified, new files, deleted, staged) are not a part of the calculation.
Now i want to change my logic to include these changes, to calculate hash of a folder but from the worktree current state and not a commit. preferably using the same logic as ls-tree (because we've used this code so far, and want to maintain compatibility).
how can this be done? would very much appreciate any help
Upvotes: 0
Views: 233
Reputation: 487885
You're starting with a misconception: Git does not store folders, and therefore does not hash folders. You might still be able to do what you want though.
Git stores:
file contents (as "blob objects"): the hash ID of a blob object is the checksum of the word blob
, a space, the decimalized size of the file in bytes, a NUL byte, and the file bytes (in that order with everything treated as a single 8-bit byte, i.e., in Python you'd use f"blob {len(data)}\0".encode() + data
as the input to the hasher);
tree objects (which store name, mode, and hash tuples): these are how file names and blob hashes wind up being stored in commits, although there are complications here: sort order in particular matters and the names are broken into components;
commit objects; and
annotated tag objects.
As with blob objects, tree, commit, and annotated tag objects have headers at the front consisting of the type, a space, a size (decimalized ASCII numeric representation), and a NUL byte. The type-strings for these three are tree
, commit
, and tag
respectively.
As you note, the result of git rev-parse HEAD:
is the hash ID of the tree object stored in the HEAD
commit. You can build a tree object from whatever is in Git's index using git write-tree
, although the index must contain all the desired file blobs and path names, and must not contain any merge conflicts at this time.
To compute what the hash ID would be for some tree, create an empty index,1 add that tree to that empty index, and use git write-tree
to create a tree object from that index. This tree object will be stored into the repository. If you wind up never using it for anything, this is a bit wasteful, but Git's GC will eventually collect it, if you're operating the system normally. Because of the ordering and component-ization issues with building tree objects, this is the only way to do it directly within Git.
In shell script, you might use the following (note that this is entirely untested):
export GIT_INDEX_FILE=/tmp/index.test.$$
rm -f $GIT_INDEX_FILE
trap "rm -f $GIT_INDEX_FILE" 0 1 2 3 15
git add .
git write-tree
The stdout from this command sequence is the hash ID of the tree (printed by git write-tree
).
If you'd like to do it in a programming language, see my Python code that does it, but note all the limitations.
1Git doesn't actually tolerate an empty index, but considers a non-existent index file as existing-but-empty. Hence the rm -f
as the line to "create" the "empty index". It might be good to put the index file into git rev-parse --git-dir
rather than /tmp
, and/or to use mktemp
rather than just assume that index.test.<pid>
is unique.
Upvotes: 2