Reputation: 15318
I've written a small groovy utility that can unzip git blob objects and it works, I can see the content of the blobs. The same works for the commits.
The problem is in trees. When I unpack them, I get:
tree 29100644 a�⛲��CK�)�wZ���S�
. As you can see after the object size it's impossible to read the content. It looks like this content is kept in a different format.
Here is my code:
ByteArrayOutputStream result = new ByteArrayOutputStream()
InflaterOutputStream byteWriter = new InflaterOutputStream(result)
byteWriter.write(new File(input).bytes)
byteWriter.close()
println result
Tried similar things in Ruby and the result was the same. So I think the problem is in the format of the file which is not Zlibbed.
Upvotes: 4
Views: 1166
Reputation: 1328072
But the tree content isn't meant to be a readable string, if I follow the article "Git tree objects, how are they stored?":
The general format is:
- First 4 bytes declaring the object type. In our case, those four bytes are “tree”, ASCII-encoded.
- Then comes a space,
- and then the entries, separated by nothing.
The exact format is the following. All capital letters are “non-terminals” that I’ll explain shortly.
tree ZN(A FNS)*
where:
N
is the NUL characterZ
is the size of the object in bytesA
is the unix access code, ASCII encoded, for example> 100644 for a vanilla file.F
is the filename, (I’m not sure about the encoding. It’s definitely ASCII-compatible), NUL-terminated.
S is the 20 byte SHA hash of the entry pointed to, 20 bytes long.
Here’s an example.
Say we have a directory with two files, calledtest
andtest2
. The SHA of the directory isf0e12ff4a9a6ba281d57c7467df585b1249f0fa5
. You can see the SHA-hashes of the entries in the output of
$ git cat-file -p f0e12ff4a9a6ba281d57c7467df585b1249f0fa5
100644 blob 9033296159b99df844df0d5740fc8ea1d2572a84 test
100644 blob a7f8d9e5dcf3a68fdd2bfb727cde12029875260b test2
Upvotes: 6