Thomas Mccaffery
Thomas Mccaffery

Reputation: 395

Doing multiple pull requests on github

When I open a pull request on GitHub, all commits since my last request and all new ones are automatically added to this request. I can't seem to control which commits are added and which are not.

I did the following:

looked at pull request and both commits are present.. I didn't want that. What am I doing wrong? I figured both commits on different branches. I am confused.

Upvotes: 1

Views: 1608

Answers (2)

user4945014
user4945014

Reputation:

I see in your actions that you didn't git add <files> between made a file change and git commit -m "Made a commit", as well as between made a file change and git commit -m "Fixed a Bug". That might have some significance.

Upvotes: 0

torek
torek

Reputation: 490068

Summary

You need to be careful about branch starting points, and the resulting commit graphs.

Description

In Git, commits can be, and very often are, on more than one branch at a time. If you are coming from some other version control systems, this can be quite startling. For instance, in Mercurial—which is otherwise very similar to Git in many ways—every commit is on exactly one branch. In Git, however, each commit is on zero or more branches simultaneously.

The implications of this reach rather far, but the most immediate one for you is that when you make pull requests on GitHub, GitHub's Git manipulations do things you did not expect. (GitHub—which I have used rather a bit more lately—seems to attempt to hide many of Git's intricacies from users. This has its advantages, since Git's complexity can be intimidating. But it has disadvantages, such as leading you down this particular path.)

I will try to avoid going into too much detail here (difficult for me :-) ). The key to understanding all of this lies in visualizing the commit graph and knowing what, at a sort of fundamental level, a pull request is.

The commit graph

The graph, which is formally a Directed Acyclic Graph or DAG, is formed by the commits you and others make. Each commit generally has one parent commit. These parents form a backwards chain. If we start with a completely empty repository (no commits at all) and make a first commit, that commit has no parent, because there's no earlier commit:

A

(I'm giving the commits one-letter names here instead of Git's incomprehensible hash IDs). When we make the second commit, it has the first commit as its parent. We say that the new commit B "points to" existing commit A:

A <-B

This continues on for as many commits as we make: each new one points to some earlier one. For simple linear chains this is all pretty straightforward, and all we have to do here is add the fact that Git needs a name it can use to find the last commit. The name we all start out with is master:

A <-B <-C   <-- master

All of Git's arrows are always backwards. The ones coming from branch names move around—in fact, master, or whatever branch we are on, gets updated automatically as we add each new commit—so I think it's wise to keep drawing them; but the arrows that tell us that commit C points to B, and B points to A, are fixed forever. It's easier to draw these without the arrows, so that's what I do.

Now let's make a new branch name, with something that has lots of commits:

...--F--G--H   <-- master, patch

What we have now is two branch names, both pointing to the same commit! Commit H is on both branches. So are commits G and F and all the earlier ones. Now it becomes important to know which branch we're actually on, so let's add (HEAD) to one of these:

...--F--G--H   <-- master, patch (HEAD)

Aha, we're on branch HEAD. Let's add a new commit I now:

...--F--G--H   <-- master
            \
             I   <-- patch (HEAD)

What Git does is the same as always: it makes the new commit with one parent, so that I points to H, and then it moves the current branch—i.e., patch, as indicated by HEAD—so that it points to the new commit.

The master branch does not change at all: it still contains commits A through H inclusive. The patch branch now contains commits A through I inclusive.

Let's make another new branch now, such as bugs. But where will the name bugs point? We have two pretty good options: we can point the name at either commit H, where master points now, or we can point it at commit I, where patch points now.

Whichever one we choose, we can now make a new commit J. But J's parent will be determined by which commit bugs points to once we create bugs! So it's pretty important which one we choose.

When you made your first commit, you chose I, and got this kind of picture:

...--F--G--H   <-- master
            \
             I   <-- patch
              \
               J   <-- bugs (HEAD)

If you choose H as the starting point, however, you'll get this other picture:

             J   <-- bugs (HEAD)
            /
...--F--G--H   <-- master
            \
             I   <-- patch

Pull requests

A pull request is always, necessarily, done with respect to some existing starting-point. If you use the old-school, pure-Git command, git request-pull, you supply the starting point. What GitHub does with their clicky buttons is automate the starting point selection for you. But there is a catch: the starting point must be a commit that both you and your upstream repository already share. (Your upstream, in this particular context, is the repository you forked. The word upstream is a bit ambiguous here: like much of Git's terminology, the word has been overloaded to mean many different things.)

In short, the upstream commit is always going to be whatever their master points to now, or at least, pointed-to earlier (their own master may have moved forward since you forked things). That's going to wind up being commit H in our drawings above.

If your graph topology resembles the first one—commit J points back to I which points back to H—then a pull request from that point will include both commits. It does not matter that I is on several branches in your repository; what matters is that commit H is the latest one that you and the upstream repository are both using. (Again, each of these one-letter names stands in for an actual raw Git hash ID, such as commit 1f931caf22b7... or whatever.)

To get just the one J commit to be in your pull request, you need your graph topology to resemble the second one, where H is J's parent. The GitHub interface seems—to me at least, though I still have relatively little experience with it—to be very good at hiding the graph topology from you, and thus very bad at letting you view the actual topology and hence see what will really happen with a pull request.

When you use command-line Git, you can view (visualize) the topology with the help of "A DOG", all decorate oneline graph:

git log --all --decorate --oneline --graph

Various GUIs will also do this for you, e.g., gitk --all. This draws the graph vertically, rather than horizontally as I do in these StackOverflow postings, but the idea should be clear enough at this point.

Upvotes: 2

Related Questions