yuan
yuan

Reputation: 2534

Git tree hash id generation

At first initialize a repo containing a file named rose

$: echo sweet > rose
$: git init
$: git add .
$: find .git/objects/ -type f
.git/objects/aa/823728ea7d592acc69b36875a482cdf3fd5c8d
$: git commit -m "rose"
$: find .git/objects/ -type f -printf "%h%f %s\n"
.git/objects/05b217bb859794d08bb9e4f7f04cbda4b207fbe9 49
.git/objects/aa823728ea7d592acc69b36875a482cdf3fd5c8d 21
.git/objects/665d02ccbacdde1c0f2eecde01fbf47144ddd492 124

Then I want to sha the blob and see how to generate tree-object's id

echo -e "tree 21\0100644 rose\0aa823728ea7d592acc69b36875a482cdf3fd5c8d"|sha1sum

What it print is not 05b217bb859794d08bb9e4f7f04cbda4b207fbe9
Where am i wrong?

Upvotes: 1

Views: 953

Answers (3)

antonio
antonio

Reputation: 11110

The format of a tree object is as follows:

tree SIZE\0ENTRIES

SIZE is the size of the tree.
ENTRIES is a sequence where each element represent an object referenced by the tree.
Each object entry is formatted as follows:

MODE NAME\0BSHA

MODE is:

  • 100644 for a normal file,
  • 100755 for an executable file,
  • 120000 for a symbolic link,
  • 040000 for a tree object.

NAME is the directory or file name.
BSHA is a binary representation of the object ID.

With respect to the example by the OP, let us get a reference to the upper tree (master branch):

$ git write-tree
05b217bb859794d08bb9e4f7f04cbda4b207fbe9

While I will use this tree-ish, what follows applies to every tree.
The first 6 characters (05b217) are sufficient.

The tree content in human readable format is given by:

$ git ls-tree 05b217
100644 blob aa823728ea7d592acc69b36875a482cdf3fd5c8d    rose

You may replace git ls-tree with git cat-file -p.

The binary format is similar to that given by:

$ git cat-file tree 05b217
100644 rose ▒▒7(▒}Y*▒i▒hu▒▒▒▒▒\▒%

The actual content has also the initial string tree [content size]\0.
To get it you might uncompress the file storing the tree inside the .git folder, using the 2/38 hash format:

$ openssl zlib -d -in .git/objects/05/b217bb859794d08bb9e4f7f04cbda4b207fbe9
tree 32 100644 rose ▒▒7(▒}Y*▒i▒hu▒▒▒▒▒\▒%

Given the objects stored in the tree and available via ls-tree, one might generate the (actual) content stored with an awk script:

$ git ls-tree 05b217 | awk -b 'function bsha(asha)\
{patsplit(asha, x, /../); h=""; for(j in x) h=h sprintf("%c", strtonum("0x" x[j])); return(h)}\
{t=t sprintf("%d %s\0%s", $1, $4, bsha($3))} END {printf("tree %s\0%s", length(t), t)}'
tree 32 100644 rose ▒▒7(▒}Y*▒i▒hu▒▒▒▒▒\▒%

To better understand the output, I produce a version of it using escape sequences:

$ git ls-tree 05b217 | awk -b 'function bsha(asha)\
{patsplit(asha, x, /../); h=""; for(j in x) h=h sprintf("%s", "\\x" x[j]); return(h)}\
{t=t sprintf("%d %s\0%s", $1, $4, bsha($3))} END {printf("tree %s\0%s", length(t), t)}'
tree 92 100644 rose \xaa\x82\x37\x28\xea\x7d\x59\x2a\xcc\x69\xb3\x68\x75\xa4\x82\xcd\xf3\xfd\x5c\x8d%     

Compare this output with the previous output from git ls-tree 05b217.

I now come to the tree hash generation using different methods.

Using the file stored version of the tree:

$ openssl zlib -d -in .git/objects/05/b217bb859794d08bb9e4f7f04cbda4b207fbe9 | shasum
05b217bb859794d08bb9e4f7f04cbda4b207fbe9 *-

Using my awk-generated content:

$ git ls-tree 05b217 | awk -b 'function bsha(asha)\
{patsplit(asha, x, /../); h=""; for(j in x) h=h sprintf("%c", strtonum("0x" x[j])); return(h)}\
{t=t sprintf("%d %s\0%s", $1, $4, bsha($3))} END {printf("tree %s\0%s", length(t), t)}' | shasum
05b217bb859794d08bb9e4f7f04cbda4b207fbe9 *-

And finally the git mktree command:

# git ls-tree 05b217 | git mktree    
05b217bb859794d08bb9e4f7f04cbda4b207fbe9    

The hash obtained is always the same.

Upvotes: 2

knittl
knittl

Reputation: 265161

echo inserts a newline by default, unless you specify the -n (omit newline) flag.

Also, the blob ID is not stored in ASCII format, but rather as binary value. This results in a object size of 32 (not 21).

The following command will give you the correct output:

echo -en 'tree 32\x00100644 rose\x00\xaa\x82\x37\x28\xea\x7d\x59\x2a\xcc\x69\xb3\x68\x75\xa4\x82\xcd\xf3\xfd\x5c\x8d' | sha1sum

Upvotes: 3

araqnid
araqnid

Reputation: 133422

The object ID in the tree is not stored in that format. Have a look:

git cat-file tree 05b217bb859794d08bb9e4f7f04cbda4b207fbe9 | od -c

Rather the tree data is a sequence of <mode> SP <filename> NUL <hash>, where <mode> is string-form mode, and <hash> is the 20-octet SHA1.

Upvotes: 3

Related Questions