Reputation: 61

How to create an empty master and raise a PR from existing branch

I have a single branch in my repo: develop. Code has already been committed and pushed in there.

There's no master branch in the repo.

I'd like to create a master branch so I can raise a Pull Request into it.

I've already tried two approaches: 1. create the master branch from develop:

git checkout develop
git checkout -b master develop

The problem here is there's nothing to compare: master and develop are already the same.

create the master branch from orphan:

git checkout --orphan master
git reset --hard

The problem here is the branches have different history and I get: "there isn’t anything to compare. (branches) are entirely different commit histories"

git version 2.17.2 (Apple Git-113)

How do I create an empty master so I can raise a Pull Request from the already existing develop?

Upvotes: 6

Answers (2)

RaulGupta

Reputation: 361

To solve this issue, the detailed explanations from -torek, was really helpful. Let me summarize the steps you can follow to resolve it:

Begin by finding the hash of the initial commit. You can do this through the UI by navigating to the "README.md" file, selecting "Blame," and then clicking on "History." Open the most recent commit and copy the 40-character hash associated with it.
Next, execute the following Git commands:

-> git checkout -b master

-> git reset --hard <commit hash copied in step 1>

-> git push origin master

These commands will create a new branch called "master," reset it to the commit specified by the hash obtained in step 1, and push the changes to the remote repository.

Once these steps are completed, you can proceed to the UI and create a pull request (PR) from the "develop" branch to the "master" branch.

Upvotes: 0

torek

Reputation: 489173

Just create a branch name. That's all there is to it, really. Pick some existing commit and make the branch name master identify that commit. That's now your master branch.

There is no such thing as an "empty branch"

There is a problem already, though, because the word branch is ambiguous, in Git. (See What exactly do we mean by "branch"?) When we say "branch", sometimes we mean a name, like master or develop, and sometimes we mean something else.

What Git really stores is a series of commits. Each commit has a unique number assigned to it: a hash ID, which looks like 8dca754b1e874719a732bc9ab7b0e14b21b1bc10 for instance. You see these in git log output.

Each commit holds a complete snapshot of your project, as a set of files (not folders, just files: the folders, if any, will be created as needed to hold the files, at the time you run git checkout). And, each commit has a bit of metadata: information about the commit, such as who made it, when, and why (its log message), and—crucially—the raw hash ID of the previous commit.

It's these raw hash IDs that make up backwards-looking chains of commits. If we use a single uppercase letter to fake out the hash IDs—this is much more useful for humans but obviously won't work for thousands of commits—we can draw a simple three-commit repository like this:

A <-B <-C

Here commit C is the latest commit. It records that commit B is the earlier commit, so we say that C points to B. Another way to say this is that B is C's parent. Meanwhile commit B says that commit A comes before B. Commit A is the very first commit anyone ever made, and therefore has no parent. This lets the action stop: history starts—or ends, if you're working backwards like Git does—here.

Sometimes, when we say branch, we mean the whole series of commits. This chain of commits, A-B-C, starting from the end and working backwards, is a branch. You direct Git to find the last commit, C, which then points back to the earlier commit B, which points back again. No matter how many commits there are, we can always have Git start at the end, and work its way all the way back to the beginning, as long as we know which commit is the last commit. But how do we find the last commit?

In this example it's easy: there are only three commits, and we made them use sequential letter names, so obviously C is the last one. But in a real repository, there may be thousands of commits:

$ git rev-list --all --count
58314

and their hash IDs look totally random. How do we know which commit is the last one? Furthermore, what if we have a series of commits like:

             I--J
            /
...--F--G--H
            \
             K--L

where there are two "last" commits?

This is where branch names come in

A branch name, like master or develop, just holds the actual hash ID of the last commit that we want to call "part of the branch":

             I--J   <-- master
            /
...--F--G--H
            \
             K--L   <-- develop

By making the name master point to commit J, we declare that J is the last commit of the branch master. By making the name develop point to commit L, we declare that L is the last commit of the branch develop.

Note that commits H, and G, and F, and whatever else comes earlier, are on both branches. This is a peculiar feature of Git: usually, most commits are on every branch. It's only the last few—or few hundred, or whatever—that are only on one or two or ten of the branches.

For a branch name to exist, it must point to some existing commit. You can make commits without using branch names—this is slightly tricky: it uses what Git calls detached HEAD mode—but you can't have a branch name unless it points to some actual commit. By pointing to that commit, that branch name declares that that commit is the last commit in that branch.

This is true even if the name points to a commit in the middle of someone else's chain. Suppose that you don't have a master, but do have:

...--F--G--H--K--L   <-- develop

You can now tell Git: create the name master, pointing to commit H by finding the actual hash of H and running:

git branch master <hash-of-H>

or:

git checkout -b master <hash-of-H>

Now you have:

...--F--G--H   <-- master
            \
             K--L   <-- develop

Note that the commits did not change at all. You just added the label master—a branch name—to remember the hash ID of commit H.

A branch name remembers the hash ID of one commit, but has another special feature

Now that you have the name master identifying commit H, you can:

git checkout master

and then do some work and make a new commit. When you make a new commit, Git packages up a new snapshot of your project and writes that out. Git adds your name as the author and committer of this new commit, sets up its date-and-time-stamps, and uses your log message as the reason this commit now exists. Git adds the raw hash ID of existing commit H as the new commit's parent. Git then saves the new commit into the database of "all existing commits", which assigns the new commit its new, unique hash ID. We'll just call this new hash ID I:

             I   <-- ???
            /
...--F--G--H   <-- ???
            \
             K--L   <-- develop

Now there's the tricky part: having created commit I, Git moves the current branch name so that it points to the new commit. Since the current branch name is master, Git changes master so that it points to new commit I:

             I   <-- master (HEAD)
            /
...--F--G--H
            \
             K--L   <-- develop

You should ask yourself two questions now:

How did Git know that it should move master and not develop?
What happens to existing commit H?

The answer to Q1 here is in the diagram: I have attached the special name HEAD to the name master. When you use git checkout to select a branch, Git attaches this name, HEAD, to that branch. That not only tells Git which commit you have checked out right now, but also which name needs to be updated when you make a new commit.

The answer to Q2 is not as obvious until you find that there's a general principle of Git, which is: once a commit is made, that commit is frozen for all time. Nothing inside any existing commit can ever be changed—not by you, not by Git. The reason for this is that the actual hash ID of the commit is a checksum of the data inside the commit. Change any of the data, even a single bit, and you change the checksum—you get a new, different commit, not the original commit after all. the original commit remains undisturbed.

The files inside a commit are all frozen, read-only, and saved for all time. They're also compressed, and in a special Git-only format. This helps keep Git repositories from becoming instantly huge: if every commit saves every file every time—which it does—how will the repository's size not get totally out of hand? One of the tricks Git uses is that if you're saving the same version of a file, it just re-uses the existing frozen copy. It can do that because the frozen copy can't change.

(I like to call these frozen, Git-only copies of files "freeze-dried". The freeze-dried files don't really have names, inside the Git repository: they have hash ID numbers instead. Note that they have to be thawed out and "rehydrated" into ordinary files for you to use them—the frozen copy is useless for getting any new work done. That's why git checkout makes a work-tree, where you can do your work. The freeze-dried files in the commit become normal files in the work-tree, with Git first creating any folders needed to hold them, using the freeze-dried names stored with the commit.)

Every Git repository has its own individual branch names

Usually, we use git clone to make our repositories, and then send our work back to whichever repository we started with later, with git push. The git clone process copies the commits from the original repository. But it doesn't exactly copy the branch names. Instead, it takes their branch names—their repository's master and develop and so on—and renames them, calling them origin/master and origin/develop and so on.

Having just made a fresh new clone of their Git, our Git has no branch names at all! It has all of their branch names, renamed to our origin/* remote-tracking names.¹ But our Git would like us to have a branch—typically master, but if they didn't even have a master, our Git will pick some other name now, such as develop. As the last step of git clone, our Git must:

pick a branch name,
create that branch name in our repository, pointing to the same commit as the origin/name name, and
git checkout that commit by that branch name.

The name picked in the first step is:

the name you supplied in your -b argument if you said git clone -b name, or
the name their Git recommended, if they made a recommendation, or
master.

That last master is a special case, used as a sort of last resort.²

So, if you clone a repository that only has a develop, you'll get your own origin/develop name—a remote-tracking name, not really a branch at all—pointing to the last commit in their develop. But then as the last step of git clone, your Git will create your own develop, pointing to this same commit, and then git checkout develop so that you're on your one single branch name, develop, and have this one commit checked out.

If you now create the new name master in your repository, you get this:

...--o--o   <-- develop, master, origin/develop

(with HEAD attached to either develop or master, depending on how you created the name master: did you use git branch, or git checkout -b?).

You can now run git push origin master to tell their Git: create a branch name master pointing to this same commit as your develop. What git push does is send their Git any commits that you have, that they don't—any commits you've made that they will need for the create or update some branch name part—and then asks them, politely, to create or update some of their branch names to match some of your branch names:

git push origin branch1 branch2 branch3

has your Git call up the Git at origin—at the URL stored under your name origin—and send them any commits they need, then ask them to set their branches named branch1, branch2, and branch3 to point to the same commits that your names branch1, branch2, and branch3 point-to. Since they can only do this if they have those commits, your Git will first send them the commits, if they don't have them. Your Git will not only send them these tip commits (for the three branches), but also any history—any earlier commits—needed to connect those tip commits to the rest of the commits in their repository.

¹Git documentation calls these remote-tracking branch names. But they're not technically branch names. In particular, if you git checkout origin/master for instance, you wind up with a detached HEAD. So I prefer to just call them remote-tracking names, dropping the word branch entirely.

²If you clone a totally empty repository, with no commits, their Git has no branch names, because it has no commits and a branch name requires a commit. So they don't recommend anything, in this case. Your Git must then use master as your branch name. Of course, you don't have any commits either. So you now wind up with the same peculiar state you have when you create a new, empty repository with git init: you're on branch master, but branch master doesn't exist! You're on a non-existent branch. Other parts of Git call this an orphan branch. It's a weird state in which your next commit creates the branch you're on, so that you can be on it properly, as it were. Until then, your HEAD just records the name of the branch to create.

Review

Here's what you have learned (and need to know):

Git stores commits, which store files, in a frozen Git-only format. Your Git will extract the files into a work-tree where you can see and work with them. The work-tree is not part of the repository database and does not get copied around: only commits get copied wholesale like this. (You can make branch names get copied, but that's one name at a time, not en-masse like commits.)
The commits are identified by hash IDs. The hash IDs are unique to each commit. The git log command will show you the hash IDs. Every Git in the universe must agree on the hash IDs.³
Each commit stores its predecessor commit's hash ID. These form backwards-looking chains.
Git uses branch names to find tip commits. Each name stores one hash ID: the ID of the commit that should be considered the end of the branch. Git works backwards from there.
History is nothing more than a backwards-looking chain of commits.
The commits are what get shared—what matter most with git clone, git fetch, and git push. The names matter too, because they're how each Git will find its commits, but the names are local to each Git: they're not necessarily universal, like the hash IDs are. In particular, branch names are expected to move, over time.
Branch names automatically move to encompass new commits, as you make new commits. When they do move like this, the existing earlier commits just become part of history. If you were to move a branch name "backwards"—so that the historical commit becomes the tip of the branch (you can do this, we just haven't covered how)—it may be very hard to find the later commits! (Do you know their hash IDs? There are ways to find them, for a while.)
The word branch is ambiguous: when you say it or hear it, think about whether you mean branch name, or some series of commits, starting at the last one and working backwards. If someone says remote branch, did they mean the branch name, such as master, as seen in the other Git? Or did they mean the remote-tracking name origin/master as seen in my Git? These are different names, and may point to different commits! A branch name in some other Git may change between now and the next time you look. Who has access to that other Git repository, and what are they doing to it right now?

³There's a future complication coming to Git—no one knows exactly when, yet—in which hash IDs will be renumbered, going from SHA-1 to SHA-256. Exactly how that will be handled is not yet defined.

Upvotes: 8