AK91
AK91

Reputation: 731

Simple add local repo to remote repo's main branch (not master)

This is driving me up the wall!

This is what I've done:

  1. Created a folder with some code (local repo)
  2. Created a repo in Github e.g. my_repo
  3. Issued these commands (and trial and error'ed hella other stuff too):
git init && \
git symbolic-ref refs/remotes/origin/HEAD refs/remotes/origin/main && \
git remote add origin https://github.com/my_github/my_repo.git && \
git checkout -b init_branch && \
git add * && \
git commit -m "Initial commit" && \
git push origin init_branch && \
git checkout main && \
git merge init_branch

The error I'm seeing:

 ! [rejected]        init_branch -> init_branch (fetch first)
error: failed to push some refs to 'https://github.com/my_github/my_repo.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

On occasions getting this error (not sure why):

error: pathspec 'main' did not match any file(s) known to git

What I've tried:

  1. Switching the default branch in Github from main to master and pushing straight to master - that worked but want to keep it as main.
  2. Not creating a new branch and attempting to push straight to main - doesn't work.
  3. Having git remote add origin ... straight after git init - doesn't work.
  4. Removed the symbolic-ref line, added git config --global init.defaultBranch main and git pull origin main. New error:
* branch            main       -> FETCH_HEAD
 * [new branch]      main       -> origin/main
error: The following untracked working tree files would be overwritten by merge:
        .gitignore
Please move or remove them before you merge.
Aborting

Any help would be much appreciated

Upvotes: 0

Views: 843

Answers (1)

torek
torek

Reputation: 490078

TL;DR

Make sure you create an empty repository on GitHub. Don't put in that initial commit. Then you can get rid of most of this stuff: you'll just create your new repository, commit to create main (and rename to main if necessary), do a git remote add origin url, and git push origin main.

(You can then, if you wish, use git remote set-head origin --auto, although I personally find no value here.)

Long

OK, first, don't do this at all:

git symbolic-ref refs/remotes/origin/HEAD refs/remotes/origin/main && \

It's not inherently wrong but it has no value.1 It has nothing to do with the errors you're getting, but removing it gets one more step out of the way and removes distractions from your view.

With that said, I think your problem stems from a fundamental misunderstanding of Git: of what it does, why, and how. To fix that, let's take a quick overview of Git repositories.


1I'm of the opinion that the existence of refs/remotes/origin/HEAD itself has no value to begin with, but even if you disagree, you should be using git remote or git ls-remote to obtain their symbolic HEAD here, rather than hard-coding main. Your remote-tracking names are supposed to reflect their branch names as they exist now, not as you wish they existed. 😀


Git's raison d'être is the commit, for which there are databases

A Git repository is, in the end, all about commits. To get there, Git has a big database of what it calls objects. There are four kinds of objects: commits, tags (annotated tags), trees, and blobs. The commits are the ones we (humans) mostly care about; the others just have supporting roles and we can mostly ignore them.

Each object—including the crucial commit objects—has a unique hash ID. This hash ID is how Git finds the object. Git stores the objects in a simple key-value database, with the hash ID being the key. So we need the hash ID to retrieve a commit. But hash IDs are big and ugly, e.g., 5a73c6bdc717127c2da99f57bc630c4efd8aed02. They need to be big and ugly because every commit in the universe has to have its own hash ID.2

But this makes them too difficult for humans to use. There's a solution for that though. Besides the commit-and-other-objects database, each Git repository also has a names database. Here, the database is still a key-value store, but this time the keys are things like branch and tag names, and the values are big ugly hash IDs.

Each name maps to exactly one hash ID. That's all we need, in Git, because a branch name automatically means "the latest commit on that branch", by storing the hash ID of the latest commit on that branch. (We'll skip over how that leads to all the rest of the commits, at least for now.)

The two databases, then, allow us to:

  • supply a name like master or main;
  • have Git turn that into a commit hash ID; and
  • have Git use that to retrieve the commit.

The commit itself holds two things: a snapshot of every file, as of the form the file had at the time you (or whoever) made it, and some metadata, or information about the commit itself. Again, we won't go into any of the details here just yet. All we need to know right now is this two-database thing.


2This is technically impossible: eventually, two different commits will get the same hash ID. Fortunately, there are two tools we can use here: (1) we can make the hash ID big and ugly so that the "eventually" does not happen for billions of years; (2) we can avoid introducing two Git repositories to each other if they have what I call doppelgängers: objects that should be different but have the same hash ID. In service of item 1, Git is in the process of moving to bigger hash IDs today.


git init

Running git init in a place (folder / directory) where there is as yet no Git repository will create a new, empty repository. This repository has no commits—its object database is essentially empty—and since names are required to hold valid object hash IDs, with branch names being required to hold commit hash IDs, the names database is also empty. There are no commits, so there cannot be any branch names.

Note that running git init in a place where there is a Git repository is mostly a big and slightly-slow nothing. Git will say that it "re-initialized" the repository: Git did some housekeeping type items, and checked up on some auxiliary items like templated Git hooks and so on. But the important parts of the existing repository—the two databases—are completely undisturbed by a git init.

With an empty repository, though, there are no commits and therefore no branches. And yet, even in an empty repository like this, you're "on" some branch: git status will say on branch master or on branch main or whatever. What's going on here?

More about the commit metadata

It's time to talk a bit more about commit data-and-metadata. We already mentioned that each commit has two parts:

  • the data: a full snapshot of every file, frozen for all time;
  • the metadata: information such as the name and email address of the author of the commit.

The snapshot is actually stored indirectly, through supporting objects. This allows the files stored inside commits to be de-duplicated. That way, since most commits mostly re-use files that are already in existing commits, the new commits don't need any space to store the files. They just latch on to the already-existing files. Since no part of any commit can ever be changed—not even by Git itself—it's quite safe to re-use pieces of old commits.

The metadata in a commit include things like when you made the commit and your log message: stuff that git log shows. But for Git's own use, each commit stores a list of the raw hash IDs—the big ugly "true names"—of previous commits. Most commits store exactly one entry in this list, which we call the parent commit. We say that these hash IDs point to the parent.

In a non-empty repository with, say, three commits, we could draw them like this:

A <-B <-C

Here C is the third and latest commit (with some actual big ugly hash ID, but we're just calling it C for commit). Inside commit C, in its metadata, we have the stuff that git log shows, plus the raw hash ID of earlier commit B, plus of course the data—the snapshot. So by reading C, Git can find commit B. This has a snapshot too. By comparing the two snapshots, Git can show us what changed between B and C, and that's what git log -p does.

Having shown commit C, though, git log now uses the parent hash ID to step back one hop, to commit B. B, like C, has data and metadata, and its metadata point to commit A. So git log can use the snapshots in A and B to show us what changed in B ... and then git log can move on—or rather, back—to commit A.

Commit A is a bit special. It has no parent. It can't! It was the first commit ever. There was no previous commit. So it just doesn't have any parents at all. Git knows that this means that "what changed" in commit A is that every file in it was added from scratch, and git log knows that having shown commit A, it can finally stop going backwards: there's nowhere left to go.

The flaw in the ointment here should be pretty obvious. We had our git log work by "magically" knowing the hash ID of the last commit C. But there is no magic. Where did Git get commit C's hash ID?

The answer is: from a branch name. We have some name, like master or main, that holds commit C's hash ID. We say that this name points to the commit:

A <-B <-C   <--main

This name-and-value pair is stored in the names database, so Git just has to look that one up to get the hash ID for commit C.

Branch names find commits, and commits find commits

What this all means is that we use branch names to find the last commit of a branch, and then use the commits to find other, earlier commits. That's what a branch name in Git is all about. But there are several tricks:

  • We can have more than one branch name.
  • We can even have more than one branch name for one commit.

Let's draw the latter case:

A--B--C   <-- develop, main

(I've gotten deliberately lazy about drawing the arrows that lead backwards from one commit to the previous one here, mostly because the arrows in arrow-fonts don't always render very well on Stack Overflow, plus they are annoying to type in.)

We are going to be using commit C, no matter which branch name we pick, but we need to pick a branch name. We need to mark that branch name as "the name we're using", too. To do that, we'll attach the special name HEAD, written in all uppercase like this, to just one branch name:

A--B--C   <-- develop, main (HEAD)

We're now using the name main to find commit C.

If we run git switch develop or git checkout develop, the picture changes slightly:

A--B--C   <-- develop (HEAD), main

We're still using commit C, but now we're using it *through the name develop.

Why does this matter? Well, let's make a new commit now. Without worrying about how we make new commits, we note that when we do make a new commit, Git does the following:

  1. Git gathers up all the metadata it needs, such as the user.name and user.email settings to put in as the name and email address of the author of the new commit.
  2. Git makes a snapshot of all of the files that should be frozen for all time in this new commit.
  3. Git adds the current commit to the metadata, as the parent. Git writes out the commit—the data-and-metadata—which computes a new unique hash ID. That's some big ugly hexadecimal number, but we'll just call it commit D here.
  4. Git writes D's hash ID to the current branch name.

Since commit D's parent is the current commit C in step 3, new commit D points backwards to existing commit C. But step 4 updates the name develop, because that's the name HEAD is attached to:

A--B--C   <-- main
       \
        D   <-- develop (HEAD)

HEAD thus provides two things:

  • it tells us which branch name is the current branch name, and
  • by having Git read the name, it tells us which commit is the current commit.

Git can answer either question about HEAD: "what name does it hold" or "what hash ID does it represent". Git just needs to ask itself the right question.

This gets us back to the anomalous empty repository case

In an empty repository, there are no commits, and therefore there are no branch names. But if we "attach" HEAD to some non-existent name—in reality, in Git, that means writing the branch name to the file .git/HEAD (but with some complicating exceptions that we won't worry about here)—then running git commit can still do the four steps we showed:

  • gather metadata;
  • make snapshot;
  • write metadata-and-snapshot to obtain initial commit; and
  • write new commit's hash ID into branch name, thereby creating the branch name.

The last step—writing the hash ID—gives us:

A   <-- main (HEAD)

as long as the name stored "in" HEAD was main. (The metadata for commit A omits any parent hash IDs because Git notices, when reading HEAD, that the name main does not exist: this makes main an unborn branch, as Git usually calls it, or an orphan branch, as Git sometimes, inconsistently, calls it.)

This explains some of your problems

If you run git init such that it creates a new, empty repository, there are no branch names yet. But you're "on" some branch anyway. The branch name you're "on" now determines the name of the branch you will create when you make the first commit.

The default name for a new, empty repository is master. In very recent version of Git, you can use -b main as arguments to git init to change this to main. You can also configure a setting, init.defaultBranch, to override the default master.

If you run git init such that it simply re-initializes an existing repository, you are on some branch now, and you're still on that branch. The init.defaultBranch setting, if it exists and would be used, isn't used here. You're just on whatever branch you're on: use git status or git symbolic-ref HEAD to find out what that is.

If you're in an empty repository right now, the branch you're "on" does not exist. So you can't rename it, at least in older versions of Git: git branch -m main may not work. (It might work, as this has been fixed up in new versions of Git: the branch rename code notices that you're on an unborn branch, and renames the unborn branch, which is simple enough to do internally, it's just that nobody thought to do it before.)

There are two procedures that always work, regardless of your Git vintage:

  1. git checkout --orphan main will switch the name of the non-existent branch that you are on to main. Your next commit will now create main.
  2. You can create your first commit, which creates master, then rename master to main with git branch -m main.

These two procedures are of course only to be used in that empty-repository case. In a non-empty repository, you're still on whatever branch you were on before you ran git init unnecessarily. Your git init didn't do anything (so why did you bother?).

Using two repositories

You mention in step 2:

  1. Created a repo in Github e.g. my_repo

When you create a new repository with GitHub's web interface (or, presumably, with their gh command line tool or CLI), you have two options:

  • Create a totally empty repository. This is like using git init.

  • Create a repository with one initial commit that has some files in it: a README, a LICENSE, a COPYRIGHT, maybe even a .gitignore, and that sort of thing. This is like using git init and then making a commit.

The difference between these is that in the second case, the GitHub repository now has one commit in it, which means it can have branch names. In the first case, the GitHub repository has no commits, which means it can't have any branch names.

GitHub have, of course, changed their own default initial branch name to main instead of master. This affects both the unborn-branch empty-repository case and the one-commit now-the-repository-isn't-empty-so-there's-a-branch case.

Unless you have some powerful need for their initial commit, I'd recommend you use the "create with no commits" option. This makes your remaining tasks trivial.

There's another, more complex case, where your GitHub repository has a bunch of commits that you'd like to keep. I'll touch on this one briefly for completeness. Here, you will make use of the fact that Git repositories share commits, but not branch names.

Whenever we connect two Gits to each other with git fetch or git push, we're invoking the commit-sharing stuff. But there are two important differences between fetch and push here:

  • git fetch means get me stuff from them. That is, I have my Git software, working on my repository ("my Git" collectively) call up their Git software and connect to their repository ("their Git"). Then my Git has their Git list out all their branch and tag names and the corresponding commit or other hash IDs. My Git then checks: for each hash ID, do I have that object? If not, my Git gets their objects from their Git, collecting all the commits they have that I don't in the process.

    So now I share all their commits. (I might have some of my own that they don't, though.) Now that I have all their commits, my Git can set remote-tracking names in my repository, to remember their branch names. My Git will create or update my origin/main and origin/develop to remember the hash IDs they have stored in their main and their develop.

    The end result of this is that after the fetch, I have all their commits, but none of my branches have been touched. My branch names are mine, not theirs. If I have a main and a fred, their main doesn't affect my main, and if they have a fred it doesn't affect mine either. If they have a barney or flintstone and I don't, I still don't have that branch. I share their commits, not their branches.

  • The git push command sends from my Git to their Git. I pick something—usually a branch name, but what I really need here is a commit hash ID—in my repository. I have my Git call up their Git and ask them if they have that hash ID. If not, my Git sends over that commit, and any previous commits that might be required that they don't already have, until they have all the commits leading up to that last commit that I just sent them too. (That way their git log can find all the commits, for instance.)

    But then, having sent them some of my commits—as many as I chose, but usually I choose by branch name so that I send them all the commits up to the last commit on some branch of mine—then something different happens. I don't have them set some sort of remote-tracking name. Instead, I ask them if they would please set one of their branch names to remember that commit hash ID.

    They can refuse to do this! It's their choice as to whether they will let me set one of their branch names. But if they allow it, I've just created or updated a branch name in their Git. I don't have to use the same name on both sides. I don't even have to use a name on my side at all:

    git push origin mybranch:free-the-ocean
    

    for instance, or:

    git push a123456:refs/heads/newbranch
    

    if a123456 is a valid shortened hash ID for a commit in my repository.

So we share commits. We do not share branch names, except to the extent that if I say git push origin mybranch, I mean git push origin mybranch:mybranch: I'm going to ask them to use the same branch name I'm using. It's still their branch name, though. It's not mine to control: I can only ask them to create or update theirs.

(GitHub in particular have "protected branches", which won't let people update them with git push. Of course, if I own the GitHub repository, I can use the GitHub web administration pages to de-protect the name and push to it, and then re-protect it. But you can see from this complicated process that I still wind up having to ask permission. It's just that I'm asking myself for permission.)

So what's this about origin/HEAD anyway?

I mentioned:

git remote set-head origin --auto

up in the TL;DR section. What this does is:

  • call up their Git (via the name origin);
  • ask them which of their branch names their HEAD is attached to; and
  • create or update my origin/HEAD name—its full name is refs/remotes/origin/HEAD—so that it is a symbolic ref to the remote-tracking name in my repository that corresponds to the same branch name in their repository.

Whew, that's a mouthful—or head-full—of concepts. Let's take it apart a bit:

  • their branch names, main and develop for instance
  • are reflected in my remote-tracking names origin/main and origin/develop
  • which my Git maintains when I run git fetch and git push.

That is, I run git fetch origin: my Git connects to their Git and gets from them a list of all their branch names and the commit hash IDs that go with those names. My Git makes sure I share all those commits, too. Now that I have all their commits, I can create or update remote-tracking names for each of their branch names. My Git will do this automatically for me.

If their HEAD holds one of their branch names—and it does—then my origin/HEAD could hold one of my remote-tracking names. The git fetch command does not maintain this automatically.3 Running git remote set-head lets you have your Git update it; see the git remote documentation for details.

The name stored in their HEAD is the name their Git will recommend at git clone time. That is, when you run:

git clone <url>

your git clone operation will:

  1. create an empty directory and do all its work there;
  2. use git init to create a new, empty repository;
  3. use git remote add origin url to create origin;
  4. insert any necessary git config operations here;
  5. run git fetch;
  6. run git checkout -b somebranch --track origin/somebranch.

The branch name in step 6 that your Git creates, based on your origin/ name that's based on one of their branch names, is the name you supplied to the -b option to git clone. If you didn't supply a -b option, your Git asks their Git which name they recommend. They recommend whatever name is in their HEAD. So that's what their HEAD is really for: to act as a default recommendation to git clone operations.

Note that there's no Git protocol for setting HEAD in someone else's repository. GitHub offer a configuration page where you can set your own repositories' HEADs. (At least one cloud provider—Google—seem to have failed to set up such a page, which leaves their HEADs set to master even if they have a main instead.)


3I don't really know why. It seems like it should. It doesn't: you need to run git remote instead. But Git explicitly lets you set origin/HEAD to any of your own remote-tracking names, regardless of which one they have theirs set to. This is theoretically useful—see the gitrevisions documentation and note step 6 of the name-resolving protocol—but I've never found it useful myself.


Stale remote-tracking names

Note that git fetch does not clean up automatically: if they had a rumplestiltskin branch yesterday, and your Git made origin/rumplestiltskin to match, and today their rumplestiltskin is gone, your left-over origin/rumplestiltskin sits there gathering dust. You can run git fetch --prune or git remote prune origin to fix these up, or you can set fetch.prune to true in your configuration to make git fetch act like git fetch --prune by default.

Upvotes: 2

Related Questions