Reputation: 2534
At first initialize a repo containing a file named rose
$: echo sweet > rose
$: git init
$: git add .
$: find .git/objects/ -type f
.git/objects/aa/823728ea7d592acc69b36875a482cdf3fd5c8d
$: git commit -m "rose"
$: find .git/objects/ -type f -printf "%h%f %s\n"
.git/objects/05b217bb859794d08bb9e4f7f04cbda4b207fbe9 49
.git/objects/aa823728ea7d592acc69b36875a482cdf3fd5c8d 21
.git/objects/665d02ccbacdde1c0f2eecde01fbf47144ddd492 124
Then I want to sha the blob and see how to generate tree-object's id
echo -e "tree 21\0100644 rose\0aa823728ea7d592acc69b36875a482cdf3fd5c8d"|sha1sum
What it print is not 05b217bb859794d08bb9e4f7f04cbda4b207fbe9
Where am i wrong?
Upvotes: 1
Views: 953
Reputation: 11110
The format of a tree object is as follows:
tree SIZE\0ENTRIES
SIZE
is the size of the tree.
ENTRIES
is a sequence where each element represent an object referenced by the tree.
Each object entry is formatted as follows:
MODE NAME\0BSHA
MODE
is:
NAME
is the directory or file name.
BSHA
is a binary representation of the object ID.
With respect to the example by the OP, let us get a reference to the upper tree (master branch):
$ git write-tree
05b217bb859794d08bb9e4f7f04cbda4b207fbe9
While I will use this tree-ish, what follows applies to every tree.
The first 6 characters (05b217
) are sufficient.
The tree content in human readable format is given by:
$ git ls-tree 05b217
100644 blob aa823728ea7d592acc69b36875a482cdf3fd5c8d rose
You may replace git ls-tree
with git cat-file -p
.
The binary format is similar to that given by:
$ git cat-file tree 05b217
100644 rose ▒▒7(▒}Y*▒i▒hu▒▒▒▒▒\▒%
The actual content has also the initial string tree [content size]\0
.
To get it you might uncompress the file storing the tree inside the .git
folder, using the 2/38 hash format:
$ openssl zlib -d -in .git/objects/05/b217bb859794d08bb9e4f7f04cbda4b207fbe9
tree 32 100644 rose ▒▒7(▒}Y*▒i▒hu▒▒▒▒▒\▒%
Given the objects stored in the tree and available via ls-tree
, one might generate the (actual) content stored with an awk script:
$ git ls-tree 05b217 | awk -b 'function bsha(asha)\
{patsplit(asha, x, /../); h=""; for(j in x) h=h sprintf("%c", strtonum("0x" x[j])); return(h)}\
{t=t sprintf("%d %s\0%s", $1, $4, bsha($3))} END {printf("tree %s\0%s", length(t), t)}'
tree 32 100644 rose ▒▒7(▒}Y*▒i▒hu▒▒▒▒▒\▒%
To better understand the output, I produce a version of it using escape sequences:
$ git ls-tree 05b217 | awk -b 'function bsha(asha)\
{patsplit(asha, x, /../); h=""; for(j in x) h=h sprintf("%s", "\\x" x[j]); return(h)}\
{t=t sprintf("%d %s\0%s", $1, $4, bsha($3))} END {printf("tree %s\0%s", length(t), t)}'
tree 92 100644 rose \xaa\x82\x37\x28\xea\x7d\x59\x2a\xcc\x69\xb3\x68\x75\xa4\x82\xcd\xf3\xfd\x5c\x8d%
Compare this output with the previous output from git ls-tree 05b217
.
I now come to the tree hash generation using different methods.
Using the file stored version of the tree:
$ openssl zlib -d -in .git/objects/05/b217bb859794d08bb9e4f7f04cbda4b207fbe9 | shasum
05b217bb859794d08bb9e4f7f04cbda4b207fbe9 *-
Using my awk-generated content:
$ git ls-tree 05b217 | awk -b 'function bsha(asha)\
{patsplit(asha, x, /../); h=""; for(j in x) h=h sprintf("%c", strtonum("0x" x[j])); return(h)}\
{t=t sprintf("%d %s\0%s", $1, $4, bsha($3))} END {printf("tree %s\0%s", length(t), t)}' | shasum
05b217bb859794d08bb9e4f7f04cbda4b207fbe9 *-
And finally the git mktree
command:
# git ls-tree 05b217 | git mktree
05b217bb859794d08bb9e4f7f04cbda4b207fbe9
The hash obtained is always the same.
Upvotes: 2
Reputation: 265161
echo
inserts a newline by default, unless you specify the -n
(omit newline) flag.
Also, the blob ID is not stored in ASCII format, but rather as binary value. This results in a object size of 32 (not 21).
The following command will give you the correct output:
echo -en 'tree 32\x00100644 rose\x00\xaa\x82\x37\x28\xea\x7d\x59\x2a\xcc\x69\xb3\x68\x75\xa4\x82\xcd\xf3\xfd\x5c\x8d' | sha1sum
Upvotes: 3
Reputation: 133422
The object ID in the tree is not stored in that format. Have a look:
git cat-file tree 05b217bb859794d08bb9e4f7f04cbda4b207fbe9 | od -c
Rather the tree data is a sequence of <mode> SP <filename> NUL <hash>
, where <mode>
is string-form mode, and <hash>
is the 20-octet SHA1.
Upvotes: 3