Reputation: 19

How does Github identify commits affecting a file path during a merge commit?

I've read in other posts that a commit with parents of size 2 usually means a merge commit. This allows me to filter out basically all of those merge commits that say merge pull request or merge master into branch etc. These usually don't appear when you click on a file's commit history

But in a subset of cases, it seems like a merge commit will appear in a file's commit history (but not for all files affected by the merge commit, just for the specific ones that perhaps actually had a change)

If I filter all commits with parents of size 2 out, I might lose commits that are part of a file path's commit history -- how is Github able to tell when a merge commit should go in a file's commit history and when it shouldn't?

Upvotes: 1

Answers (1)

Brian61354270

Reputation: 14423

_{Disclaimer: I am not specifically familiar with how GitHub reports the history of a file. This answer addresses how tools in general can determine whether a (merge) commit affects a particular file.}

There's a common misconception that commits in git are internally represented as diffs. That is, if you look inside a commit, you'll see something lines change line 7 in foo.txt to 'blah blah blah'. This idea is reinforced by the fact that pretty much all command line tools that let you view individual commits (git show, git diff, etc.) report the contents as some sort of diff. With that understanding, it's not at all clear what the contents of merge commits would look like, much less why some merges that include modifications to a particular file are treated as being part of the files history, while others are not.

But that's not actually how commits are represented. Commits in git reference exact states of files (specifically, they reference particular tree object, which records the exact state of entire working tree). If I give you a commit ID, you can figure out the exact contents of any particular file without ever looking at what the contents were in the parent commits.

Since commits store exact states of files, there is no way to look at a commit in a vacuum and know which files it affected. The commit simply doesn't know. The only way to know which files a commit affected is by comparing the contents of its tree with the tree(s) of its parent commit(s).

This actually makes the question of whether a merge commit affected a particular file very simple. In fact, it's no different than the question of whether an "ordinary" commit with a single parent affected a particular file.

Say we have a commit A with parent B. We want to know if A should be included in the history of some file foo.txt. To answer that question, we look at the contents of foo.txt at A, and check whether it exactly matches its contents at B. If it does then, A didn't affect foo.txt, so we should not include A in the history of foo.txt. But if the contents do not match, then A records a modified state of foo.txt, so we should include it in foo.txt's history.

What happens if A has more than one parent commit? We just do the same thing iteratively for each parent. For each parent, we check if the contents of foo.txt at A exactly matches the contents at the parent commit. If we find a match at any parent commit, then we need not include A in the history of foo.txt.

In practice, this means that a merge commit will be included in the history of a file foo.txt whenever there were changes to the file in both parents' histories, relative to their common ancestor. The state of foo.txt in the merge commit will be some combination of the changes from both histories, either determined automatically by a merge strategy or manually picked by the merger during conflict resolution.

Upvotes: 2

How does Github identify commits affecting a file path during a merge commit?

Answers (1)

Related Questions