michael nesterenko
michael nesterenko

Reputation: 14439

git commit integrity

I tried to search for what is git commit consists of and what parts takes part in commit's sha hash. But probably I was asking wrong words I found nothing.

I wonder what commit consists of. I read community book a bit and there is following image:

enter image description here

However I think that commit has more fields than shown in the image.

Now the main question: what fields are taking part in commit sha hash? I am asking that cause I got two commits in different repositories with the same sha hash, but with different parent commits. Before that I thought that two commits may have the same sha hash if they are the same and has the same parent commit. So I am a bit confused.

I have two local repositories (git1, git2) one is a clone of other.

git1

commit 4f438f9579939312689eb67e5fb7957d87cfa036 <-- this commit
Author: Michael Nesterenko <[email protected]>
Date:   Mon Jun 25 00:00:31 2012 +0300

    stuff after change

commit e91e833158bb44f54f418cc5c3e1832452051428
Merge: dc69dc2 0b5912b
Author: Michael Nesterenko <[email protected]>
Date:   Mon May 7 02:09:18 2012 +0300

    Merge branch 'master' of e:/temp/git2

    Conflicts:
        file.file

commit 0b5912bd1a1cb9b78410fe5c0dc67845ca1deec5
Author: Michael Nesterenko <[email protected]>
Date:   Mon May 7 02:06:46 2012 +0300

    c8

commit dc69dc25a1e0c9067cbca19fe6a1d078a19138a0
Author: Michael Nesterenko <[email protected]>
Date:   Mon May 7 02:06:29 2012 +0300

    c7

commit f6d88da1ecc3106f6debe1eac80d4b02705bcecf
Merge: d1a3c38 6134e66
Author: Michael Nesterenko <[email protected]>
Date:   Mon May 7 02:05:05 2012 +0300

    Merge branch 'master' of e:/temp/git1

    Conflicts:
        file.file

commit d1a3c389416ff88e195e93def9a956fad1e63819
Author: Michael Nesterenko <[email protected]>
Date:   Mon May 7 02:03:47 2012 +0300

git2

commit e1ee3b2756d4d8440ae3661df3fb3ec9af7cd55a
Merge: 4296e1b 4f438f9
Author: Michael Nesterenko <[email protected]>
Date:   Mon Jun 25 00:01:30 2012 +0300

    Merge branch 'master' of e:/temp/git1

    Conflicts:
        file.file

commit 4f438f9579939312689eb67e5fb7957d87cfa036 <-- this commit
Author: Michael Nesterenko <[email protected]>
Date:   Mon Jun 25 00:00:31 2012 +0300

    stuff after change

commit 4296e1bd046c4008166cfc516ef5ee2ce98a27d1
Author: Michael Nesterenko <[email protected]>
Date:   Sun Jun 24 23:57:14 2012 +0300

    more stuff

commit e91e833158bb44f54f418cc5c3e1832452051428
Merge: dc69dc2 0b5912b
Author: Michael Nesterenko <[email protected]>
Date:   Mon May 7 02:09:18 2012 +0300

    Merge branch 'master' of e:/temp/git2

    Conflicts:
        file.file

commit 0b5912bd1a1cb9b78410fe5c0dc67845ca1deec5
Author: Michael Nesterenko <[email protected]>
Date:   Mon May 7 02:06:46 2012 +0300

    c8

commit dc69dc25a1e0c9067cbca19fe6a1d078a19138a0
Author: Michael Nesterenko <[email protected]>
Date:   Mon May 7 02:06:29 2012 +0300

That commits has different parents but the same sha hashes.

Upvotes: 1

Views: 319

Answers (1)

CB Bailey
CB Bailey

Reputation: 792059

It is vanishingly improbable that two different commits in clones of the same repository have the same id but different commits. The list of parents is part of the data for a commit that is hashed. To see the data that is hashed you can run:

git cat-file commit <id-of-commit>

The actual data that is hashed is a header consisting of: <type> <size>\0 followed by the data for the commit. E.g. for a commit in my git clone:

$ printf 'commit %d\0' $(git cat-file commit 5498c5f05283cd248fd5e4f48cb8902e9ca6ce28 | wc -c) >tmp.dat
$ hexdump -C tmp.dat
00000000  63 6f 6d 6d 69 74 20 33  30 34 00                 |commit 304.|
0000000b
$ git cat-file commit 5498c5f05283cd248fd5e4f48cb8902e9ca6ce28 >>tmp.dat
$ sha1sum tmp.dat
5498c5f05283cd248fd5e4f48cb8902e9ca6ce28  tmp.dat

Note the sha1sum matches the commit id, and the parent is part of the commit object:

$ git cat-file commit 5498c5f05283cd248fd5e4f48cb8902e9ca6ce28 | grep parent
parent 3ba46634202968045e05e4d7f969d97c61efa53d

git log outputs a flat list of commits but commits in git form a directed acyclic graph. As the log is a flattened list of commits it will only directly reflect the commit to parent relationship in the trivial case of a simple linear history. If you have any merges (or are logging multiple branches) you will not be able to infer parentage relationships directly from the output of git log .

To show the actual links from commit to parents you would need to use something like git log --graph (with which I normally recommend --oneline) or a graphical visualization tool such as gitk.

Upvotes: 6

Related Questions