Git: i forgot create branch master, how i can create it if my repository already has another branch?

Question

I got an issue with my repository but i have no solution to resolve it. Please correct me.

I created a repository on bitbucket, cloned it and got message "You appear to have cloned an empty repository". Next, i created new branch my-feature with command git checkout -b my-feature and added some code. Finally, i push to remote repository with command git push --set-upstream origin my-feature. All done, but i just realized that my repository has no branch master. Now, it has only branch my-feature.

How i can create branch master but still remain my-feature branch (that means i don't want to create new repository)?

torek · Accepted Answer

What you have now is a Git repository—or rather, two very similar repositories; we'll just think of it as one for now—with just one branch name, my-feature. This is not an error situation. There is no requirement that any Git repository have a branch name master in it.

Still, if you'd like to have a branch name master, all you have to do is create one. There's nothing special about the name master.¹ A branch name in Git is just a way to find one specific commit. Git is not about branches. Git is all about commits.

¹Well, almost nothing: there are a bunch of little random bits of stuff here and there. For instance, most humans will assume that the name master means something. 😀

Git is about commits

To understand what's going on here, and why this is OK as is but you can create a master whenever you like, let's look at how Git really works. Again, Git is all about commits. So the main thing to know is what a commit is, how we find them, and how they accumulate in a repository.

The first thing to know is that every commit is numbered. These numbers are really big and ugly looking and weird: they don't simply count up like 1-2-3. For instance, this commit is numbered 71ca53e8125e36efbda17293c50027d31681a41f. The number on any given commit is totally unique to that one commit. If you had this commit in your Git repository, it would have this same number. If you don't have this commit—and you don't because it's a commit for Git, in a Git repository for Git—then you don't have any commit with this number.

The uniqueness property is why these numbers are so big and ugly. They're computed by running a cryptographic hash function over the contents of the commit. This has a consequence: the number is deeply attached to the contents, so the contents can never change. No part of any commit—any internal Git object, really—can ever change, because its number depends on its content. This is the magic: this is how two different Git programs can agree that only this commit gets this number.²

Because the numbers come out of a hashing function, they are called hash IDs. Because the original hash function was (and still is for now) SHA-1, they're also called SHA-1 IDs or SHA-1s. Because Git is in the process of going to still-larger hashes, Git is changing the internal name from SHA1 to OID, or Object ID. (Commits are one of four internal object types, and all of them use this same hashing system.)

I mostly call these hash IDs myself, but be aware of the other names.
The other thing to know is that each commit stores two parts:
- The main data of a commit is a snapshot of all the files Git knew about at the time the commit was made. We won't go into the details of the source of the snapshot here, but it's not your working tree, it's the thing that Git uses three names for: the index, the staging area, or the cache. All names refer to the same thing.
- Besides the snapshot, each commit contains some metadata, or information about the commit itself. This includes the name and email address of the author of the commit, for instance, with a date-and-time-stamp for when the commit was made. The log message you enter, explaining why you made the commit, goes here; you'll see that log message in git log output.
  
  Crucially for Git, Git sticks something in this metadata for its own use. Each commit stores a list of the hash IDs of some earlier commit or commits.
  
  Most commits store just one hash ID, for one earlier commit. We can call these ordinary commits to distinguish them from commits that have no earlier hash IDs, or two-or-more earlier hash IDs. At least one commit—the very first one—literally can't store any earlier commit hash ID, so it just doesn't; we call that one the root commit.³

²The pigeonhole principal tells us that this scheme must eventually fail. The number of bits in the hash ID is designed to make it such that the failure is so many trillions of years in the future that we don't care about it. There's a small flaw in this idea, but it's fine for now.

³A repository can have more than one root commit, but this is at least a little bit unusual. We won't get into the details here.

Hash IDs are too klunky: enter branch names

Let's draw a simple repository that has just three commits in it. Rather than their actual big ugly hash IDs, we'll call these three commits A, B, and C—and we'll draw them like this:

A <-B <-C

Remember that each one holds a snapshot and some metadata. Commit C is the last of these three commits, so it's the most recent and, in a way, the most important. Inside commit C, we have the latest snapshot, in a read-only form. We also have the metadata, including the hash ID of earlier commit B. Let's be "on" commit C, but use this hash ID.

Inside commit B, we have the snapshot and metadata, including the hash ID of earlier commit A. Without yet going to commit A, we can compare the files saved in both B and C. All of the files that are the same are uninteresting, but for files that have changed, we can show the changes. That's pretty useful—so that's what git show or git log -p will do, if we're on/using commit C: it will show the changes from B to C.

If we use git log, we can now have Git go back one step, from C to B. Now we have a snapshot and metadata, including the hash ID of commit A, but this time we'll go ahead and look up commit A. By comparing its snapshot to that in B, we can see what changed. So git log -p can print the log message for commit B, then show the changes from A to B.

Once again, we can have Git step back one hop, to commit A. Commit A, being the very first commit, has no earlier commit: its list of previous commit hash IDs is empty. So commit A is our root commit, and all the files in A are "new". The git log -p command will just show them as new files, and since there's no earlier commit, it will stop here.

Note that Git works backwards. This is generally true of all things in Git: they always work backwards, from latest towards earliest. The reason for that is those embedded hash IDs: they look like arrows pointing backwards. We start with commit C because it's the most recent commit. We do, however, have to know the hash ID of commit C.

We could write down the hash ID of the latest commit. We could keep it on a scrap of paper, or a whiteboard, or whatever. Starting from the end, we tell Git to look at C, and Git can find all the earlier commits on its own. But this seems silly. We have a computer. Why not have the computer keep the hash ID of commit C somewhere?

This is what a branch name does. A branch name simply holds the hash ID of the latest commit that is part of that branch. We can draw that like this:

A <-B <-C   <--branch

To make a new commit, we have Git package up a snapshot and metadata. The metadata for our new commit, which we'll call D, will include the actual hash ID of commit C, as found by reading the hash ID stored under the branch name branch. So new commit D will point back to existing commit C:

A--B--C
       \
        D

(I've gone to lines instead of arrows because I don't have good graphics for arrows here. We know nothing inside any commit can change, and the arrows coming out of commits always point backwards, and come out of the commits, and hence can't change where they point. So the lines work just as well, as long as we remember that Git can't follow them forwards, only backwards.)

Now that commit D exists and has a hash ID—computed by hashing all the stuff in the commit, including the date-and-time-stamp for when we created commit D—now we can have Git write that hash ID into the name branch, so that the name points to commit D instead of commit C:

A--B--C
       \
        D   <-- branch

and now we can straighten the whole thing back out again:

A--B--C--D   <-- branch

Branch names find commits, regardless of how many branch names there are

Let's start with our three-commit setup again, without having made D yet. Let's call the first name main (as GitHub usually do now) instead of master, though in fact, any name will do fine. Let's draw that, but this time I want to add one more thing to our drawing:

A--B--C   <-- main (HEAD)

The new thing is this HEAD, in parentheses. We have this to mark which branch name we are using. Right now there's only one name, so there is only one name we can use, but we're about to change that by adding a new name.

Now let's create a new name, develop. We must pick some existing commit to make this name exist, because a branch name is required to point to some existing commit. So let's pick commit C, which is the latest on main and is the commit we're using right now. We run:

git branch develop

and get:

A--B--C   <-- develop, main (HEAD)

Note now both branch names point to commit C. This means commit C is the last commit on both branches. That's perfectly fine, in Git; it means all three commits are on both branches, too.

The special name HEAD is still attached to main, so we're still actually using the name main. Let's run git checkout develop, which does this:

A--B--C   <-- develop (HEAD), main

We're no longer using the name main as our current name. It still exists and still points to commit C, but now HEAD is attached to the name develop. That name also points to commit C, so nothing else has to change, and nothing else does change. We're still "on" commit C, but now, we're "on" it because we're "on" the name develop.

Now we'll make our new commit D as before. Git will package up everything and write out a commit, which gets a new, unique hash ID. (If we put in exactly the same files, use exactly the same commit message, and make it at exactly the same time, we'll get the same hash ID we got last time—but if the time is different, or anything else has changed, we'll get a totally different hash ID. I'm still going to call it D, though.)

As the last step of committing, Git will update the current branch name, but now that's develop, not main, so now we get:

A--B--C   <-- main
       \
        D   <-- develop (HEAD)

We still have two branch names; we still have HEAD attached to develop; but now we have a new commit D, and the name develop picks new commit D.

We can now switch to the latest main commit with git checkout main, which picks commit C, or the latest develop commit with git checkout develop, which picks commit D.

Note that the files in each commit are read-only, frozen in their form at the time, for all time. This means Git has to copy the files out of the commit, so that we can use and change them. We won't go into detail about this here, but it's something to keep in mind: the files you see and work with are not in the repository! They're copies, extracted from the repository.

How you got into this situation: branch names can't exist without commits

When you started out, you used Bitbucket to create an empty repository. Drawing an empty repository isn't very interesting:

👻

There are no commits, and for a branch name to exist, it must point to some existing commit. There aren't any! So no branch names can exist either.

You then cloned this empty repository, making a copy on your machine. When you did that, you got a warning:

You appear to have cloned an empty repository

Git gives you this warning because you're in this same weird state: with no commits, no branch names can exist.

Despite the fact that no branch names exist, Git still requires that the special name HEAD be "attached to" the current branch name. What happens in this case is that Git shoves an initial dummy name into the internal HEAD file, so that HEAD is attached to master or main or whatever name you pick as your default initial branch, or give to git init if you have one of the new versions of git init that takes a name argument.

At this time, you can run git checkout -b as much as you like. Each one will shove a new name into the special HEAD name. That branch name continues not to exist, and continues to be the current branch name, in this weird special state in which you're on a branch that does not exist.

So, when you ran:

git checkout -b my-feature

you told your Git to set up your empty repository so that you were on the non-existent branch named my-feature.

When you make the first commit in this state, this creates the branch. The first commit anyone ever makes in an otherwise empty repository is a root commit, with no earlier commits:

A   <-- my-feature (HEAD)

This first commit, which we can call A regardless of what hash ID it gets, has no parent, so it just sits there. But now there is a commit! Now there can be a branch name! In fact, now there can be infinitely many branch names. They all just have to point to A. That's the only commit, so all branch names must point here. Let's make a branch xyzzy here by running git branch xyzzy (which uses the current-and-only-available commit):

A   <-- my-feature (HEAD), xyzzy

Once you create a second commit B, you have two commits:

A   <-- xyzzy
 \
  B   <-- my-feature (HEAD)

You can continue to create new branch names, and this time, you can pick either commit A, or commit B, to have them point-to. So now you can make a master that points to either A or B.

What you need to do now

All you have to do now is pick some existing commit. Run git log to get a list of your commits, on your (single) branch my-branch. If there's only one, that's the only one available. If there are more than one, those are the ones available. Pick some available commit and tell git branch to put a branch name there. Let's say that one of the available hashes is a123456.⁴ Then run:

git branch master a123456

and Git will create, in your own repository, the name master, pointing to the existing commit whose hash ID starts with a123456.

You can also use git checkout -b master hash, which means create the name, then attach to it. If you leave out the hash ID, both git branch and git checkout -b assume that you mean to use the current commit, as found by using the special name HEAD.

Now that you have the name, use git push to ask the Git over on Bitbucket to create the same name in their repository, using the same hash ID:

git push --set-upstream origin master

The --set-upstream is sort of optional, but will do something you want.⁵

⁴An actual hash will be long and difficult to type in. Use cut-and-paste with the mouse, or something, to grab the whole thing, or just type the first four or more characters. If you type in something that resembles the first part of a hash ID, Git will try to figure out if that's short for a longer hash ID that starts with what you've typed in.

⁵See Why do I have to "git push --set-upstream origin "?