Do Forks of repo introduce history on merges

Question

I'm working on a project where most developers are using feature branches that get merged back into master via PRs. One developer decided to create a fork of the repo, create branches there and do PRs from there.

We've observed behavior where merging his PRs introduces files into the master branch (as expected) but merging in PRS from the feature branches seem to wipe them out.

Can a fork rewrite history in a upstream repo? What is happening here?

torek · Accepted Answer

Can a fork rewrite history in a upstream repo?

No.

What is happening here?

That's less clear: we would need to see the repositories in question, and the processes that you use.

As Vinyl Warmth noted in a comment, "fork" is not a Git thing, it's a GitHub / Bitbucket / other web site that provides "added value" thing.

Keep this in mind at all times: Git does not really care about branch names. Git cares about commits. Branch names, within a repository, are only used to identify commits—specifically, one particular commit, which Git then calls the tip commit of that branch. It's the commits that matter. Commits are the history; the history is nothing but the commits. Branch names just serve to find the last commit of that branch. Each commit remembers its previous commit—its parent—and Git starts at the end and works backwards.

Git also does not really care too much about files. Git cares about commits. Each commit has a complete, independent snapshot of the entire source tree, so extracting one particular commit gets you that source tree: files come along for the ride. It's the commits that matter.

Cloning a repository copies all the (reachable) commits, starting from the various names (branch and tag names) and working backwards. You then git checkout one particular commit, and that is how you get files. The files you get are the files that are in that one particular commit.

What git push does is connect your Git—your local clone—to some other Git. Your Git then transfers any commits you have that they don't, and has their Git set some name—typically some branch name—to remember the last of these new commits. So we really only care about the commits in the repository, as found by some name.

With that in mind, note that a fork on GitHub or BitBucket is merely a clone with some extra fanciness handled in the web service. In particular, the server:

remembers the repository from which the fork was cloned;
depending on the server, may toss in yet more "added value" items such as repository popularity, some sort of "network graph" of who is using which repositories in what ways, and so on.

It's the first bullet-point here that is particularly relevant: because the web site has remembered the source of the new clone, the web site allows the user who owns the new clone to make pull requests to the user who owns the original.

Let's use some names to make it easier to keep track of what's going on. Let's call the original repository O (for original) and the fork F (for fork). What we have, in essence, is: F remembers O.

That's almost all there is to it! It's just yet another clone. Every clone you make in order to work with some repository, whether it's a clone of O or a clone of F, is yet another clone. If Oscar clones O and Oscar makes some changes to Oscar's clone and runs git push origin, Oscar pushes Oscar's commits to O, because Oscar's origin URL is that of O. If Fred clones F and makes some changes and runs git push origin, Fred pushes Fred's commits to F, because Fred's origin URL is that of F.

Oscar can now make a pull request within O, from one commit already stored in O (under some branch name that Oscar probably made up) to another branch-name in O. Fred, meanwhile, can make a pull request to O, from one commit stored in F (under any branch name Fred chooses since F's names are independent of O's names now!). When Fred does this, Fred's commits in F that aren't in O are first copied to O, so that they reside in O, stored only under a "pull request" name in O and not under any branch name in O. That, in essence, is the only difference: the commits Oscar made went straight into O and have a branch name in O along with the pull request, while the commits Fred made went into F first, then went into O when Fred made the pull request. Fred's commits in O are currently only found via the PR-name, not via some branch name.

There's no particularly good reason for Fred to bother with the fork F if Fred can push directly to O. There's no particularly bad reason not to bother with F either. The use of F can make it slightly more difficult for co-workers to obtain Fred's work if Fred hasn't made a pull request yet, but the use of F can make it slightly easier for Fred to come up with temporary branch-names for things he thinks aren't ready for review and pull-requests and so on.

Do Forks of repo introduce history on merges

Answers (1)

Related Questions