Sisi
Sisi

Reputation: 41

What happens to commits on a branch after the branch has created, and merged vis a pull request?

I am working on a team and one of my team mates created a pull request on a branch that wasn't finished. If the pull request is approved and merged will future commits auto pull into the main branch?

Upvotes: 3

Views: 5756

Answers (2)

torek
torek

Reputation: 489708

Your question is focused on pull requests—but it really shouldn't be. What you need to focus on here are the commits, and how you find them. This involves branch names.

Long

Commits are permanent1 and unchangeable.2 So, nothing ever happens to any commit, as long as you can find it.

Branches—or more precisely, branch names—in Git are temporary and highly changeable. Any branch name can be renamed, or even removed entirely, at any time. This means that branch names have no meaning other than any meaning you give them. You must maintain that meaning, if you want it to be maintained.

The correct question, then, is not what happens to my commits (nothing ever happens to those) but rather what happens to my branch names. A good subsidiary question is: Given that branch name are so fragile, why do we even bother with them?


1More correctly, they are permanent as long as you can find them. If you can't find a commit, do you still have it? (How will you know?)

2This is a property of all of Git's internal objects. Commits are made up from three of Git's four kinds of internal objects; the fourth kind is used for annotated tags. Once made, any of these objects is completely unchangeable. So no part of any commit can ever change.


Branches are for finding commits

The real name of any commit is a hash ID. Run git log on your computer, and note the hash IDs that spill out. Or, look closely at various pages on GitHub for clickable links (often colored blue) whose names are weird strings of letters and digits like d1b10fc. These—at least when spelled out in full; shortened ones like d1b10fc are abbreviations—are hash IDs, typically for commits (though sometimes for some other Git internal object).

These big ugly strings express a number in hexadecimal. This number is the "true name" of a commit. Each commit gets one—and it has to be different from that of every other commit ever,3 which is why it has to be so big and ugly. Git needs this number in order to find the commit.

But: would you ever want to type in this number? How would you even remember it? Memorizing 2e36527f23b7f6ae15e6f21ac3b08bf3fed6ee48 is right out, and even 2e36527f23 is too much. But what if you only had to remember a name like main or master?


3Technically, two different Git commits could have the same ID as long as they never meet. I like to call these doppelgängers, since they would have the effect of breaking Git: a true harbinger of bad luck. In practice, though, they just don't occur.


Drawing a picture

The main things to know about commits, in Git, are these:

  • Every commit is numbered, with a big ugly random-looking hash ID.
  • Every commit holds a full snapshot of every file.
  • Every commit also holds some metadata: information such as who made the commit, and when.

In the metadata for each commit, Git stores the hash ID of a previous or parent commit. Technically this is a list of zero or more parent hash IDs, but most commits have just one, so the usual case is that each commit remembers the commit that comes before it.

What this means is that when we have a string of commits, they are all in a row, in the order they were made, and each one "points backwards" to the previous commit. If we use uppercase letters to stand in for the actual hash IDs (because the actual hash IDs are too big and ugly to bother with), we can draw this like so:

... <-F <-G <-H

Here H stands in for the hash ID of the last commit in this chain. If we somehow know the hash ID of H, we can have Git find it for us. Inside commit H, Git will find:

  • a full snapshot of every file, saved forever as a read-only archive, as the files appeared at the time we (or whoever) made commit H; and
  • information such as who made commit H.

In that information, Git will find the hash ID of earlier commit G. So Git can now use that to find the snapshot in G and the metadata in G. The metadata in G include the hash ID of earlier commit F, so from here, Git can move back to commit F.

This repeats forever, or rather, until Git gets to the very first commit ever made, which—by definition—doesn't have any commit that comes before it. So it just doesn't list any. This lets Git stop going backwards.

There are several key items to note here:

  1. Git works backwards.
  2. Git needs—desperately—some way to find commit H.

The second point is where branch names come in. Let's make the name main find commit H, by storing the hash ID for H in main:

...--F--G--H   <-- main

We can add more names. These names can point to the same last commit, like this:

...--F--G--H   <-- main, develop

Now develop also says that commit H is the last commit of develop. This means all the commits are on both branches. We need a way to pick which name we're actually using, so we'll attach a special name, HEAD, to exactly one branch:

...--F--G--H   <-- main, develop (HEAD)

This means we're on branch develop, as git status will say. If we now make a new commit, this is what actually happens in Git:

...--F--G--H   <-- main
            \
             I   <-- develop (HEAD)

That is, we make a new commit: a new snapshot, with metadata that says we made this, just now; the previous or parent commit is commit H. Git assigns this new commit its new big ugly unique hash ID—we won't worry about how, but every Git will agree that this commit gets this hash ID, and no other commit can have it—and we've drawn in that hash ID as the letter I instead of using the actual hash ID (because that's too ugly, plus I don't even know what it would be).

The last step of making this commit, for Git, is to write the hash ID into the branch name to which HEAD is attached. That makes the name develop find commit I.

If we make a second new commit, we get:

...--F--G--H   <-- main
            \
             I--J   <-- develop (HEAD)

The name develop, which we're still using, now records the hash ID of the new commit J we just made. J points backwards to I: commit I is commit J's parent. Commit I points backwards to H, which points backwards to G, and so on. The backwards-workings of Git are preserved, no matter how many commits we add.

Now, if we decide that branch develop was a bad idea, we can:

  1. switch back to main, and then
  2. delete the name develop.

Nothing happens to the commits (yet) but they become very hard to find:

...--F--G--H   <-- main (HEAD)
            \
             I--J   ???

Eventually, Git will notice that (a) these commits can't be found (unless you know their hash IDs, that is) and (b) they've been sitting around like this too long. Git will now discard these unfindable (except by raw hash ID) commits. So besides just finding commits, branch names have an extra side effect: making them findable preserves them. This rule applies to all the commits backwards from the last one, as found by the branch name, so commits up through H are all still protected here.

Note that branch names are not the only kind of name in Git. Git also has, for instance, tag names. These find commits, and have the same side effect of protecting commits. There are other names, such as remote-tracking names, special names that Git uses for git bisect, and so on. All of these do the same thing: they all find one commit, and thereby protect that one commit and all commits backwards from there.

Pull requests in GitHub

A GitHub pull request—which isn't a Git thing; GitHub invented these (other hosting sites copied the invention)—is a way for GitHub to have their Git attach a special name—not a branch name, but a pull-request number instead—to some particular commit.

This makes the pull request, and the commits find-able by that PR number, persist on GitHub, in their Git repository. If you want to get these commits into your repository you'll generally give them a branch name in your repository (because branch names are easier to work with, on your end).

Pull requests in GitHub have a bunch of complications, because GitHub also added to Git the concept of a fork. A fork, in GitHub, is a clone that adds special features. One of the big features added here is (insert drumroll here) ... the Pull Request.

Each Git repository is independent. That's true after a normal everyday clone: you make a clone to your laptop, for instance, and this clone copies the commits of some original repository to some new repository, without copying its branch names. Instead, the original repository's branch names become remote-tracking names in your laptop-clone. This involves renaming them: instead of main, for instance, you get origin/main. Then your Git, on your laptop, makes one branch in the clone, using one of the renamed remote-tracking names. So you get a branch named main that points to the same commit as your origin/main, which your Git made from their branch main.

When you use the GitHub fork button, however, GitHub has their Git make their clone—on their system—in a slightly different way:

  • First, they copy all the branch names. So your GitHub-side clone has the same commits and the same branches. That's different from your laptop-side clone, which only has the same commits; your laptop has remote-tracking names instead of branch names, plus one new branch name that matches one original branch name.
  • Second, they link together the new GitHub-side clone and the original. This makes it possible for you to use your GitHub clone to make Pull Requests from your GitHub-side clone/fork, to whoever owns the original GitHub-side repository.

Not everyone actually uses these features, and we can't tell, from your question, whether your group is using these features. A group (organization or entity) can give you direct write access to their GitHub-side repository, in which case you can create new branch names in their GitHub-side repository, without having to go through your own GitHub-side repository.

To summarize this section:

  • In general, when using Git and GitHub, you have at least two repositories: one on GitHub and one on your own computer (the one I am calling "your laptop", whether or not it's actually a laptop). You might, however, have three repositories: the original on GitHub, your fork on GitHub, and your repository on your laptop.

  • To make a Pull Request on GitHub, you must send your commits to a GitHub repository. This can be your fork (which—since it's yours—you can write to), or perhaps the original on GitHub (which, since it's not yours, you can only write to if they—whoever "they" are—gave you permission). You do this by creating or updating some branch name in whichever GitHub repository you send commits to.

  • Then, you use the GitHub web interface to create the new Pull Request from a branch in one of these GitHub-side repositories. You choose the destination branch name and repository, too. This winds up with GitHub making a special non-branch name—a name of the form refs/pull/number/head—in at least one of the GitHub-side repositories. This serves to retain the commits even if you delete your branch name in the GitHub-side repository. If you're doing this across a GitHub fork, GitHub make the commits available to the other GitHub repository; if you're doing it in a single repository on GitHub, they're already available to the single repository.

You don't generally need to worry about the special refs/pull/ names: GitHub manages those all on its own. But they do exist and do protect the commits.

Updating a Pull Request

If a PR on GitHub is still open—has not been merged or otherwise closed yet—you or anyone else can, if you/they have the right permissions, git push from your/their laptop to the GitHub repository where the Pull Request was first made, to the branch name that was used when making that PR.

While we haven't covered how git push works, this causes GitHub to update that branch name. When GitHub do that, they notice: hey, we made a PR from this branch and it's still open and automatically update the pull request.

So, suppose your colleague pushed to some branch B in some GitHub repository, and made a PR from that branch, and that PR is still open. This is the setup you have described.

Suppose further that you have a branch on your laptop that finds those same commits—the same hash IDs, in other words. You can make new additional commits on your laptop. These commits are, at this point, only on your laptop.

You can now git push this last commit (it will drag in earlier ones as needed: this is all automatic) to any repository to which you can git push. If you can git push to the GitHub repository from which the PR was made, and to the same branch name, you can use this to update your colleague's PR.

If you cannot push to this particular branch or repository, it would be wise to get the PR closed. You'll then push your commits to some GitHub repository that you can push to, and open a new PR when the full set of commits are ready. You'll do this with branch names: branch names that exist locally, on your laptop, in your Git repository, and branch names—they can be the same or different names—in some GitHub-side repository to which you can git push. These branch names will supply the hash ID of the last commit in some chain of commits. Git will work backwards from there as needed.

Upvotes: 4

Code-Apprentice
Code-Apprentice

Reputation: 83577

will future commits auto pull into the main branch?

No, commits will only be added to the branch where they are committed. Git does nothing automatically that would alter the history of your code. If you want these commit merged to the main branch, you should have your teammate submit a new PR.

Upvotes: 3

Related Questions