user9583887
user9583887

Reputation: 33

How do I pull changes from a branch in master to a subbranch?

So I have a branch B1 which is a branch of master. I later created branch B2 from branch B1. However, when I do git pull in B2, I'm not pulling the changes from branch B1. How do I pull changes from B1? Thank you.

Upvotes: 2

Views: 5852

Answers (2)

torek
torek

Reputation: 490068

I recommend avoiding git pull entirely, at least (or especially) if you're just starting with Git. All git pull does is run git fetch, then run a second Git command for you. The problem is that this obscures what is really happening, preventing you from learning what you need to know.

What you need to know starts with what a commit really is and does, because Git is all about commits. Git is not about files, though a commit does hold files; and it's not even really about branches either, though branch names like b1 and b2 help you—or at least Git—find commits. But while commits hold files, and branch names hold commit hash IDs, Git is really all about the commits.

So, each commit:

  • Holds a snapshot of all of your files. That's a full snapshot, not changes since the previous commit.
  • Holds some extra information—some metadata—such as your name and email address, and why you made the commit you made. Most of this is just for other humans, but there's a key element in the metadata that's for Git itself, which we'll get to in a moment.
  • Has a hash ID. That hash ID is a big ugly string of letters and digits. It looks random, though it's entirely not-random. In fact, the hash ID is a cryptographic checksum of the complete contents of the commit: the files-as-snapshot, and the metadata.

The result of all this is that once made, no part of any commit can ever change. You can take everything out of a commit, make changes to it somewhere else, and then make a new commit from the result, but if you've changed even one single bit anywhere—your name, your log message, one bit of some source file, whatever—what you get is a new and different commit. So commit hash IDs are guaranteed to find one particular commit, and every Git uses the same hash algorithm, so all Gits everywhere use the same hash IDs for the same commits.

How branches grow, or,there is no such thing as a sub-branch

The next thing you need to know is that there is no such thing as a sub-branch.

Given that every commit has its own unique hash ID, we're still left with a problem: how do we find commits? The hash IDs look random, and certainly are not in any sort of useful order. This is where Git's internal metadata, and branch names, come in.

Every commit holds, as part of its metadata, a list of parent commit hash IDs. Usually there is just one hash ID here. That one hash ID is the (singular) parent of this commit. This ties all commits together, one by one, backwards:

... <-F <-G <-H

where H stands for the hash ID of the last commit in the chain. Commit H itself records the actual hash ID of earlier commit G, so by looking inside H we can find the number—the hash ID—that lets us find G. Commit G stores the hash ID of earlier commit F, and so on.

We've now reduced the problem to finding commit H. The way we do that is with a branch name like master or b1. We store the raw hash ID of commit H into the name, giving us:

...--F--G--H   <-- b1

Each branch name holds exactly one commit hash ID. If we make a second branch b2, we must pick one of the various existing commits, and make b2 hold that hash ID. Typically we might pick the current commit, i.e., the one with hash H:

...--F--G--H   <-- b1, b2

Now we need to remember which name we're using. Git uses the special name HEAD (in all uppercase like this) for that. There are several ways to draw that; here, I'll just attach HEAD in parentheses to one of these branch names:

...--F--G--H   <-- b1 (HEAD), b2

or:

...--F--G--H   <-- b1, b2 (HEAD)

At this point, let's make a new commit, in the usual way: modify something, git add, and git commit. Git builds the snapshot for the new commit from Git's index—something we won't describe properly here, but this index is why you have to run git add all the time–and the appropriate metadata, including a log message you must write. This new commit gets a new, unique hash ID. The data for the new commit is the new snapshot, and the metadata is what you said, except for the parent: that comes from what Git knows, which is that the current commit is H at this time. So new commit I points back to existing commit H:

...--F--G--H
            \
             I

and now Git updates whichever branch name HEAD is attached to. If that's b2, the name b2 now holds the hash ID of commit I:

...--F--G--H   <-- b1
            \
             I   <-- b2 (HEAD)

Note that commits up through H are now on both branches, while commit I is only on b1. The name HEAD remains attached to the current branch, but now the current commit is commit I.

Merging

Suppose you start with:

...--G--H   <-- master (HEAD)

and then create names b1 and b2, both pointing to existing commit H. You then select b1 to work on, with git checkout b1, and make two commits:

          I--J   <-- b1 (HEAD)
         /
...--G--H   <-- master, b2

Note that b2 has not yet moved. Now you run git checkout b2 to select commit H again. The snapshots you made for I and J are still there, frozen for all time, but now you're working with the snapshot from H again. You can now make a couple of commits here on b2, giving:

          I--J   <-- b1
         /
...--G--H   <-- master
         \
          K--L   <-- b2 (HEAD)

Note that, at this point, you could git checkout master and make new commits there:

          o--o   <-- b1
         /
...--o--o--o--o--o--o   <-- master (HEAD)
         \
          o--o   <-- b2

We won't worry about this just yet, but note that nothing happened to any existing commit or any other branch name. Each time we added new commits, the current branch name moved. The new commits all just added to the existing graph of all-commits-so-far. It's totally impossible to change any existing commit, but we never did. All we did was add new commits that link back to the existing ones, and then change the branch names.

Let's go back to the idea of merging, though. We didn't make any new master commits after all, and if we git checkout b1, and temporarily stop drawing in the name master (because it will get in the way), we have:

          I--J   <-- b1 (HEAD)
         /
...--G--H
         \
          K--L   <-- b2

We can now run git merge b2, or we can run git checkout b2; git merge b1. Git will now do its best to combine work, because git merge is about combining work.

Now, each commit holds a snapshot, but combining work requires looking at changes. What Git does here is to use its internal difference engine, which you can invoke yourself with git diff. To make this work, Git first has to find the best common commit: a commit that is on both branches, and not too different from the latest commit on each branch.

Here, it's super-obvious which commit is the best common commit. That's commit H. So Git will now compare the snapshot in H to the snapshot in whichever commit we picked with our git checkout—the commit HEAD is attached to. If we assume that's b1, the commit for this comparison is J:

git diff --find-renames <hash-of-H> <hash-of-J>   # what we changed

This tells Git what "we" changed on b1, with respect to the common starting point. Then Git diffs the other pairing:

git diff --find-renames <hash-of-H> <hash-of-L>   # what they changed

This tells Git what "they" changed on b2, with respect to the same starting point.

The merge operation now combines these changes, file by file. If we changed line 42 of main.py and they didn't touch main.py at all, Git takes our change. If they did touch main.py too, but not line 42, Git takes their change as well. If we both touched line 42, we'd better both have changed it the same way. If not, Git declares a merge conflict and leaves us with a mess.

Assuming there are no merge conflicts—the above is not a complete list of possible conflicts, just the obvious kind—Git will be able to combine the two sets-of-changes and apply all those changes to all the files as they appear in commit H. That way, the new commit that Git is about to make has our changes from J, but also has their changes from L.

At this point, Git makes the new commit. As usual, the new commit has a snapshot: that's the one Git built by combining changes. As usual, the new commit has a parent; since we're on branch b1, this parent is J. But—not as usual—the new commit has a second parent too. Since we told Git to merge commit L, this second parent is L.

Having made the commit, Git drags our current branch name forward:

          I--J
         /    \
...--G--H      M   <-- b1 (HEAD)
         \    /
          K--L   <-- b2

and this is the result of running git merge.

Git is distributed

Now, you're probably not the only person who makes commits. Other people make commits too. You typically start by cloning a repository, perhaps from GitHub, perhaps from somewhere else. This gets you all of their commits. Your Git is going to call this other Git origin, by default.

We already saw that in any repository, we find the last commit of some branch by its branch name. So their Git—the one over on GitHub, or whatever—has some branch names:

...--o--o   <-- master
         \
          o--o--o   <-- develop

and so on. When you clone their repository, you get all of their commits, with their unique hash IDs that stay those hash IDs. But you don't get their branch names. Those are theirs. Your Git takes their names and changes them, so that you can have your own branch names.

The changes your Git makes to their branch names turn their names into your remote-tracking names. These names look like origin/master and origin/develop. These remember, in your repository, where their branch names were:

...--D--E   <-- origin/master
         \
          F--G--H   <-- origin/develop

For the last step of your git clone, your Git creates a new branch name, pointing to the same commit as one of their branch names. You can choose which branch this is but usually it's just master, so that you get:

...--D--E   <-- master (HEAD), origin/master
         \
          F--G--H   <-- origin/develop

If you now go making new commits, you get this:

          I--J   <-- master (HEAD)
         /
...--D--E   <-- origin/master
         \
          F--G--H   <-- origin/develop

Now, suppose that they, whoever they are, manage to create two new commits on their master. Your Git does not have these commits yet, but you can get them.

You run:

git fetch origin

(or just git fetch, which defaults to origin). Your Git calls up their Git again. They tell your Git about their branch names (and other names) and your Git discovers that, gosh, there are new commits K-L to get from them. So your Git does, and then your Git updates your origin/master accordingly:

          I--J   <-- master (HEAD)
         /
...--D--E--K--L   <-- origin/master
         \
          F--G--H   <-- origin/develop

(I've assumed here they did not add new commits to their develop, otherwise we'd have new commits on origin/develop now too.)

So, this is what git fetch is all about: your Git calls up some other Git, and gets new commits from that Git (if there are any to get) and then updates your remote-tracking names. After this git fetch you may have new commits. What if you want to use them?

To use their new commits after fetching, you need a second Git command

Let's say that you'd like to merge their work with your work, to create merge commit M in the usual way. You can now run:

git merge origin/master

This will make new merge commit M on its own, if it can, merging your J with their L, exactly as we saw above. It doesn't matter that commit L is found by a remote-tracking name instead of a branch name. Git doesn't care about names; Git cares about commits. So we get:

          I--J--M   <-- master (HEAD)
         /     /
...--D--E--K--L   <-- origin/master
         \
          F--G--H   <-- origin/develop

This is where git pull comes in

The pull command just combines two commands into one:

  • run git fetch to get new commits from them; then
  • run a second Git command to incorporate those new commits.

The default second command is git merge, so git fetch uses the commits that just came in to run a git merge. But git pull can run a deliberately limited kind of fetch, and doesn't just merge with any new commits from them. To know what to merge, git pull requires that you say what to use here.

The confusing part is that when you name what to use on the git pull command, you must use their branch names, even though your Git is going to work with your own remote-tracking names in a moment. So if you want to fetch from origin, then merge with what they call b2, you need:

git pull origin b2

This runs a limited kind of git fetch, then merges with whichever commit they're naming via their b2, that your Git names via your origin/b2. The missing slash here is because the first step—the git fetch—needs to know which Git to call up (the one at origin) and the second step needs to know which of their branch names to use.

Both of these two steps, git fetch and git merge, can fail. It's very unlikely for git fetch to fail (and git pull will stop if it does), but if you keep these as two separate commands, you can tell which one failed, if either one did. Moreover—and for me, much more important—you can look to see what git fetch fetched before you run git merge. And last, not so important once you know it, you'll realize that these are two entirely separate operations in Git: fetch gets commits; merge combines work.

It's safe to run git fetch any time, from any branch. Fetch just calls up another Git and gets commits from them. This does not touch anything you're doing right now. It's not so safe to run git merge any time: that merges something into your current branch, and you only have one current branch at any time. If you have lots of branches to update, you can git fetch and update all your remote-tracking names all at once, but then you have to, one by one, git checkout the branches you want updated, and git merge each one, with whichever commit you want to merge with—probably from the remote-tracking name for that branch, but because you can look, you can check first to see if you really do want that merge.

Sending commits to them

You now have commits I-J-M (with M having two parents, the first being J and the second being L) that they don't have. To send them your commits, you can use git push, providing they give you permission. (Note: these permissions are controlled by the hosting software, not by Git itself.) Running:

git push origin master

has your Git call up their Git, give them commits you have that they don't, that they'll need—in this case, I, J, and M—and then ask them to set their branch name. Note that you do not ask them to set a remote-tracking name. There are no remote-tracking names in this direction! You just ask them to set a branch name directly.

Sometimes, they'll accept this request. Sometimes they might refuse, because they've set up preventive barriers—maybe on specific branch names, and again, this is controlled by the hosting software—or because they just don't allow pushes at all. Or, they might refuse because you didn't merge, or didn't merge recently enough. Suppose they had the above, but while you were running git merge, someone else added another commit. If you run git fetch now, you'll get:

          I--J--M   <-- master (HEAD)
         /     /
...--D--E--K--L--N   <-- origin/master
         \
          F--G--H   <-- origin/develop

which means that their master now names commit N, which didn't exist a moment ago. You might need to merge again, or maybe you want to remove your commit M entirely (which is a little tricky sometimes) and make a new merge O that incorporates N this time.

Upvotes: 4

Vikash
Vikash

Reputation: 56

You can either merge or rebase the branch b1 to branch b2.

git checkout b2
git merge b1

or

git checkout b2
git rebase b1

Upvotes: 2

Related Questions