Reputation: 485

Why is "master" required in git pull --rebase origin master but not git rebase -i origin?

When I want to rebase against remote master, I use

git pull --rebase origin master

If I use

git pull --rebase origin

I receive the error

You asked to pull from the remote 'origin', but did not specify
a branch. Because this is not the default configured remote
for your current branch, you must specify a branch on the command line.

But why is it that

git rebase -i origin

works?

And in this case

git rebase -i origin master

actually results in

fatal: fatal: no such branch/commit 'master'

I have no local branch named master, but why is it not going to the remote branch in this case?

Upvotes: 0

Answers (1)

torek

Reputation: 488183

The git pull command is quite different from most other Git commands. I'd say that in many ways, the closest other Git command is git gc, which—like git pull—is a convenience wrapper to avoid needing to type in multiple separate Git commands.¹

What git pull does is:

run git fetch; then
run a second Git command.

The first command, git fetch, needs the name of a remote. The name origin is the standard name for the first remote, and since most Git repositories only have one remote, origin is the name for the first, last, and only remote in that repository.

You can leave this off—you can run git pull with no additional arguments—and Git will figure out some appropriate remote. But if you're going to supply additional arguments, the first non-option argument is the remote name, so git pull frabjous uses the word frabjous as the name of the remote.

The second command is either git merge or git rebase.² This second command needs a commit hash ID, such as 4c53a8c20f8984adb226293a3ffd7b88c3f4ac1a, or something that will work in place of a commit hash ID.³ Usually, though, we use a name—a branch name like master or main, or dev, or whatever—as the "something that will work" here. The general idea—the way to think of git pull—is: get stuff from the other guy, then incorporate it. The "other guy" here is the remote, and the "stuff to get" is "any new commits he has on some branch of his". So the name you put in here, when you put in a name, is the other guy's branch name.

Note that, as with git fetch, you can leave all of this out, and just run:

git pull

The pull command will figure out a remote to use—probably origin—and a name to use, all on its own, based on the upstream that you have set for the current branch. The "upstream" is just a thing-you-can-set: for your branch named xyzzy, the upstream is probably already set to origin/xyzzy.

Note that the upstream name here, origin/xyzzy, has a slash in it: it's made up of the name of the remote, origin, then a slash, then the branch name as seen on the remote, xyzzy. So if the branch name as seen on the remote is frab/jous, you'd have origin/frab/jous here, with two slashes: one to separate origin from the other guy's branch name, and one in the other guy's branch name.

If you're going to put in a name at all, on your git pull command, you must place this after the remote. Having done that, Git assumes you'll just put in the branch name as seen on the remote. So you type in:

git pull origin frab/jous

or whatever here, to mean:

run git fetch origin; then
resolve origin/frab/jous to a hash ID and run git merge or git rebase as appropriate.

Note that either of these two steps can fail entirely, and the second one can stop in the middle. If one step fails, any remaining steps don't happen at all, and you should restart from the failed point, whatever that was, if you want to pick up where it left off—so you need to know which step failed, if one fails. Luckily for most of us, git fetch is very safe to run an extra time so we can mostly just ignore its failure-vs-success. But you still need to know whether to finish a stopped-in-the-middle merge or rebase. For this and other reasons, I always encourage Git newbies to learn the separate commands first. Recognizing when they've worked, when they've failed completely, and when they have stopped in the middle is important.

Unfortunately, that means you need to learn that oddity, where git pull takes the other guy's name for the branch (leaving out the origin/), and git merge or git rebase takes your name (including the origin/). But you were going to have to learn this anyway. Make a note of it! Their branch names are theirs; your Git repository reads their name-and-hash-ID values from them (during the git fetch step), and stores them in your Git repository under these origin/-prefixed names.

This is still leaving out a lot. Git has a very steep setup learning curve. I'll take a break now for footnotes and then address one other thing.

¹git gc runs git repack, git prune-packed, git reflog expire, git worktree prune, git prune, git pack-refs, and/or git rerere gc if/as appropriate. This isn't meant to be a completely exhaustive list as the list has changed at times (e.g., git worktree didn't exist before Git 2.5) and I don't really keep track. I generated this list by glancing over the git gc documentation. I think this particular manual page might have been the main inspiration for https://git-man-page-generator.lokaltog.net/ 😀

²There are a few special-case exceptions, including doing nothing at all if the git fetch step fails.

³This is a bit of an oversimplification, as git merge and git rebase can take more than one hash ID, and for a case that is never used by git pull, git rebase also requires a branch name. For the purpose of being run by git pull, though, they wind up using hash IDs here.

`origin` is both a remote and ... well...

But why is it that
git rebase -i origin
works?

Here's where another piece of the steep learning curve whacks you in the face.

Git is, in the end, all about commits. The commits in the repository are the reason for using Git. The individual commits are numbered, but the numbers themselves are big, ugly, random-looking things that are completely unsuited for humans. These are the hash IDs or Object IDs, that spill out from git log for instance. They're only really usable via cut-and-paste, so we mostly don't use them after all: we use names.

As a result, Git provides not one but two key-value databases. One of these is indexed by the hash IDs, and that's how Git gains access to its commits and other internal data. Git puts in a hash ID, and gets the commit or other object whose key is that particular hash ID. When the object is a commit object, that represents a full snapshot of every file, frozen for all time in the form it had at the time you (or whoever) made the commit.

To find the hash ID, though, Git keeps a second database where the keys are names: branch names, tag names, and other sorts of names. The branch names, like master or main, dev or develop, frab/jous, and so on, are up to you: you can choose any name you like (although it's wise to stick in a dash or slash or letter outside the [0-9a-f] set, because the "names" cafebabe and badf00d and deadcab could be abbreviated hash IDs). To keep branch and tag names from bumping into each other, Git actually sticks refs/heads/ in front of each branch name, and refs/tags/ in front of each tag name.

The names that Git stores in your repository, so as to remember some other Git repository's branch names, are remote-tracking names (Git calls these remote-tracking branch names) and are actually prefixed with refs/remotes/, so rather than origin/dev, these are really refs/remotes/origin/dev.

All of these names, in these various namespaces, hold one hash ID each. That's all Git needs, because commits themselves also hold other commit hash IDs. From one commit, Git can find another one. From there, Git can find yet another commit—and so on, and on. Git simply defines a branch name as "this name holds the hash ID of the commit that is to be called the latest on this branch".

So, if you're on some branch main, the name holds some hash ID H, which is the hash ID of some commit:

            <-H   <-- main

Each commit holds a list of previous-commit hash IDs, usually just one entry long, along with the snapshot of all files. That's the backwards-pointing arrow coming out of H, here. Commit H holds the hash ID of some earlier commit. Let's call that one G and draw it in:

        <-G <-H   <-- main

Of course, G is a commit with a snapshot and another backwards arrow, so it must point to some earlier commit, which repeats over and over:

... <-F <-G <-H   <-- main

and that's a Git branch. To add a commit to a branch, we "check it out" or "switch to it" by name, making the name the current branch name and the corresponding commit H the current commit.

We can have more than one name pointing to this commit. Let's draw in several names: main and dev and also origin/main, which isn't a branch name but still points to a commit. For laziness I'll stop bothering with arrows between commits, but remember that Git only works backwards, never forwards:

...--F--G--H   <-- dev, main, origin/main

We pick one branch—let's say dev—to switch to. To remember that we're using the name dev, we attach the special name HEAD to it:

...--F--G--H   <-- dev (HEAD), main, origin/main

Now we fiddle around the way we do with Git—which I won't cover here but the index or staging area (two terms for the same thing) is crucial—and eventually make some new commit. The new commit, which we'll call I, has a new unique hash ID and points backwards to existing commit H, like this:

...--F--G--H
            \
             I

The tricky bit is that Git updates the current branch name as soon as it has finished making new commit I. None of the other names are updated, so they all still point to H:

...--F--G--H   <-- main, origin/main
            \
             I   <-- dev (HEAD)

Commit I is now the latest commit on dev. Commits up through H are still on dev, and continue to be on main as well. The special name HEAD is still attached to dev, and our current commit is now commit I. Commit H still exists (and, crucially for Git's hashing scheme, is completely untouched: this is why the arrows all go backwards, not forwards).

Okay, but—so what? Well, Git is, as I said earlier, all about the commits. When you give Git a branch name, most of the time it very quickly turns that name into a hash ID by figuring out where the name points. (The git switch and git checkout commands are unusual here in that they have to remember the name, too, so that you can become "on" that branch when they're done.) There's a command-line Git command that does this for you, namely git rev-parse. If we give git rev-parse some branch names, we can see it in action:

$ git rev-parse master
5d01301f2b865aa8dba1654d3f447ce9d21db0b5
$ git rev-parse diff-merge-base
fa1c8acabf0d5649baf87f549d67426d14255e0f

It can parse tag names too though, and remote-tracking names, and with --symbolic-full-name it can tell us what the full spelling of each name is:

$ git rev-parse --symbolic-full-name v2.35.1
refs/tags/v2.35.1
$ git rev-parse --symbolic-full-name origin/master
refs/remotes/origin/master
$ git rev-parse origin/master
5d01301f2b865aa8dba1654d3f447ce9d21db0b5

What happens if we give it origin alone?

$ git rev-parse origin
5d01301f2b865aa8dba1654d3f447ce9d21db0b5
$ git rev-parse --symbolic-full-name origin
refs/remotes/origin/master

Well, that's a bit peculiar, isn't it? Let's take a look at the gitrevisions documentation, which is crucially important and cleverly hidden in plain sight in a pile of 1000 largely unreadable manual pages:

SPECIFYING REVISIONS
...
<refname> e.g., master, heads/master, refs/heads/master
... a <refname> is disambiguated by taking the first match in the following rules:

If $GIT_DIR/<refname> exists, that is what you mean (this is usually useful only for HEAD, FETCH_HEAD, ORIG_HEAD, MERGE_HEAD and CHERRY_PICK_HEAD);

otherwise, refs/<refname> if it exists;

otherwise, refs/tags/<refname> if it exists;

otherwise, refs/heads/<refname> if it exists;

otherwise, refs/remotes/<refname> if it exists;

otherwise, refs/remotes/<refname>/HEAD if it exists.

It's this six-step rule that makes name abbreviations work. We write:

git rebase master

and Git tries master as a file in .git (step 1), but that doesn't exist, so Git goes on to try refs/master as a name (step 2). That doesn't exist either so Git tries refs/heads/master as a name (step 3). That one does exist, in this repository anyway, so it resolves to a hash ID and the revision specifying is complete.

If we use origin/master, step 5 finds it, because refs/remotes/origin/master exists (use git for-each-ref to dump out the ref table, and see that it does exist). And if we use origin—which doesn't seem to be a ref-name at all—step 6 finds it, because refs/remotes/origin/HEAD exists.

Now, HEAD—and correspondingly, refs/remotes/origin/HEAD—is a special case: it's a symbolic reference, which in Git is analogous to a symbolic link in Unix/Linux file systems. (In fact, in early Git implementations, it simply was a symbolic link. That does not work well on Windows though, so now it's a file with contents.) The git for-each-ref command expands the link by default, but git branch -r doesn't, so that's one way to see this.

The bottom line

The conclusion of all of this is:

origin/HEAD is a symbolic ref for whatever branch is the HEAD in origin, usually master or main;
origin by itself is either a remote (as used by git fetch), or resolvable via step 6 of gitrevisions (as used by most other Git commands);
git rebase -i origin resolves it via origin/HEAD and step 6; but
git pull origin master doesn't use step 6 at all: the string origin is just a remote, and the string master gets mapped through the remote-tracking names to become origin/master (and in this particular case git pull actually sidesteps all this because it's using the .git/FETCH_HEAD file mechanisms, which predate all this stuff and go through somewhat different code paths).

The git pull command passes most of its flags and arguments on to git fetch, except for some flags that it passes to the second command, and some flags that it uses itself. It's enormously complicated because of historical ... mistakes? ideas? concepts? anyway, history of the way Git used to work, which must be preserved in amber for the next 300 million years, or whatever. 😀 (Seriously though, the Git folks take compatibility itself quite seriously and try not to break existing uses and workflows.)

Upvotes: 0

Why is &quot;master&quot; required in git pull --rebase origin master but not git rebase -i origin?

Answers (1)

origin is both a remote and ... well...

The bottom line

Related Questions

Why is "master" required in git pull --rebase origin master but not git rebase -i origin?

`origin` is both a remote and ... well...