user2379740
user2379740

Reputation:

For Git, is it absolutely true that same commit hash value means same repository?

If I get Git right, each commit comes with an SHA-1 checksum. To generate such hash value, Git also takes the previous commit as the hash function's input. That is to say, except hash value collision (be it an accident or an attack), suppose I see the last commit of two repositories has the same hash value, I can be confident that these two repositories are exactly the same.

Is this understanding correct?

Upvotes: 0

Views: 2235

Answers (2)

bk2204
bk2204

Reputation: 76409

When two commits in two separate repositories have the same object ID, they will refer to the same history, including all commits, trees, and blobs reachable from them, assuming no hash collisions have occurred.

Note that this does not mean that the repositories are completely identical. Those two repositories might have branches, tags, or other references pointing to different commits, and they may also have different sets of objects referred to by the reflog.

Note that if you are using a SHA-1 repository, it is not safe to rely on the absence of hash collisions. The cost to create a SHA-1 collision is approximately USD 11000, so any medium-sized company or government agency can afford to create collisions. While Git has measures to detect if colliding objects are pushed to a repository, that wouldn't have any effect if the repositories were separate. If you require integrity, you need to use a SHA-256 repository instead.

Upvotes: 2

Stanislav Bashkyrtsev
Stanislav Bashkyrtsev

Reputation: 15308

Since the collision of SHA1 is so small that we neglect it, we can treat it as a unique identifier of the content it represents. Therefore if 2 commits from different repos have the same SHA1, then these commits are identical and their history is identical. It doesn't mean that those repos have the same list of commits though.

By the way, this feature is extensively used by GitHub: internally they combine all forks of the repo into 1 big repo. This way the eliminate extra copying.

Upvotes: 3

Related Questions