Reputation: 731
This is driving me up the wall!
This is what I've done:
my_repo
git init && \
git symbolic-ref refs/remotes/origin/HEAD refs/remotes/origin/main && \
git remote add origin https://github.com/my_github/my_repo.git && \
git checkout -b init_branch && \
git add * && \
git commit -m "Initial commit" && \
git push origin init_branch && \
git checkout main && \
git merge init_branch
The error I'm seeing:
! [rejected] init_branch -> init_branch (fetch first)
error: failed to push some refs to 'https://github.com/my_github/my_repo.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
On occasions getting this error (not sure why):
error: pathspec 'main' did not match any file(s) known to git
What I've tried:
main
to master
and pushing straight to master
- that worked but want to keep it as main
.main
- doesn't work.git remote add origin ...
straight after git init
- doesn't work.symbolic-ref
line, added git config --global init.defaultBranch main
and git pull origin main
. New error:* branch main -> FETCH_HEAD
* [new branch] main -> origin/main
error: The following untracked working tree files would be overwritten by merge:
.gitignore
Please move or remove them before you merge.
Aborting
Any help would be much appreciated
Upvotes: 0
Views: 843
Reputation: 490078
Make sure you create an empty repository on GitHub. Don't put in that initial commit. Then you can get rid of most of this stuff: you'll just create your new repository, commit to create main
(and rename to main
if necessary), do a git remote add origin url
, and git push origin main
.
(You can then, if you wish, use git remote set-head origin --auto
, although I personally find no value here.)
OK, first, don't do this at all:
git symbolic-ref refs/remotes/origin/HEAD refs/remotes/origin/main && \
It's not inherently wrong but it has no value.1 It has nothing to do with the errors you're getting, but removing it gets one more step out of the way and removes distractions from your view.
With that said, I think your problem stems from a fundamental misunderstanding of Git: of what it does, why, and how. To fix that, let's take a quick overview of Git repositories.
1I'm of the opinion that the existence of refs/remotes/origin/HEAD
itself has no value to begin with, but even if you disagree, you should be using git remote
or git ls-remote
to obtain their symbolic HEAD
here, rather than hard-coding main
. Your remote-tracking names are supposed to reflect their branch names as they exist now, not as you wish they existed. 😀
A Git repository is, in the end, all about commits. To get there, Git has a big database of what it calls objects. There are four kinds of objects: commits, tags (annotated tags), trees, and blobs. The commits are the ones we (humans) mostly care about; the others just have supporting roles and we can mostly ignore them.
Each object—including the crucial commit objects—has a unique hash ID. This hash ID is how Git finds the object. Git stores the objects in a simple key-value database, with the hash ID being the key. So we need the hash ID to retrieve a commit. But hash IDs are big and ugly, e.g., 5a73c6bdc717127c2da99f57bc630c4efd8aed02
. They need to be big and ugly because every commit in the universe has to have its own hash ID.2
But this makes them too difficult for humans to use. There's a solution for that though. Besides the commit-and-other-objects database, each Git repository also has a names database. Here, the database is still a key-value store, but this time the keys are things like branch and tag names, and the values are big ugly hash IDs.
Each name maps to exactly one hash ID. That's all we need, in Git, because a branch name automatically means "the latest commit on that branch", by storing the hash ID of the latest commit on that branch. (We'll skip over how that leads to all the rest of the commits, at least for now.)
The two databases, then, allow us to:
master
or main
;The commit itself holds two things: a snapshot of every file, as of the form the file had at the time you (or whoever) made it, and some metadata, or information about the commit itself. Again, we won't go into any of the details here just yet. All we need to know right now is this two-database thing.
2This is technically impossible: eventually, two different commits will get the same hash ID. Fortunately, there are two tools we can use here: (1) we can make the hash ID big and ugly so that the "eventually" does not happen for billions of years; (2) we can avoid introducing two Git repositories to each other if they have what I call doppelgängers: objects that should be different but have the same hash ID. In service of item 1, Git is in the process of moving to bigger hash IDs today.
git init
Running git init
in a place (folder / directory) where there is as yet no Git repository will create a new, empty repository. This repository has no commits—its object database is essentially empty—and since names are required to hold valid object hash IDs, with branch names being required to hold commit hash IDs, the names database is also empty. There are no commits, so there cannot be any branch names.
Note that running git init
in a place where there is a Git repository is mostly a big and slightly-slow nothing. Git will say that it "re-initialized" the repository: Git did some housekeeping type items, and checked up on some auxiliary items like templated Git hooks and so on. But the important parts of the existing repository—the two databases—are completely undisturbed by a git init
.
With an empty repository, though, there are no commits and therefore no branches. And yet, even in an empty repository like this, you're "on" some branch: git status
will say on branch master
or on branch main
or whatever. What's going on here?
It's time to talk a bit more about commit data-and-metadata. We already mentioned that each commit has two parts:
The snapshot is actually stored indirectly, through supporting objects. This allows the files stored inside commits to be de-duplicated. That way, since most commits mostly re-use files that are already in existing commits, the new commits don't need any space to store the files. They just latch on to the already-existing files. Since no part of any commit can ever be changed—not even by Git itself—it's quite safe to re-use pieces of old commits.
The metadata in a commit include things like when you made the commit and your log message: stuff that git log
shows. But for Git's own use, each commit stores a list of the raw hash IDs—the big ugly "true names"—of previous commits. Most commits store exactly one entry in this list, which we call the parent commit. We say that these hash IDs point to the parent.
In a non-empty repository with, say, three commits, we could draw them like this:
A <-B <-C
Here C
is the third and latest commit (with some actual big ugly hash ID, but we're just calling it C
for commit
). Inside commit C
, in its metadata, we have the stuff that git log
shows, plus the raw hash ID of earlier commit B
, plus of course the data—the snapshot. So by reading C
, Git can find commit B
. This has a snapshot too. By comparing the two snapshots, Git can show us what changed between B
and C
, and that's what git log -p
does.
Having shown commit C
, though, git log
now uses the parent hash ID to step back one hop, to commit B
. B
, like C
, has data and metadata, and its metadata point to commit A
. So git log
can use the snapshots in A
and B
to show us what changed in B
... and then git log
can move on—or rather, back—to commit A
.
Commit A
is a bit special. It has no parent. It can't! It was the first commit ever. There was no previous commit. So it just doesn't have any parents at all. Git knows that this means that "what changed" in commit A
is that every file in it was added from scratch, and git log
knows that having shown commit A
, it can finally stop going backwards: there's nowhere left to go.
The flaw in the ointment here should be pretty obvious. We had our git log
work by "magically" knowing the hash ID of the last commit C
. But there is no magic. Where did Git get commit C
's hash ID?
The answer is: from a branch name. We have some name, like master
or main
, that holds commit C
's hash ID. We say that this name points to the commit:
A <-B <-C <--main
This name-and-value pair is stored in the names database, so Git just has to look that one up to get the hash ID for commit C
.
What this all means is that we use branch names to find the last commit of a branch, and then use the commits to find other, earlier commits. That's what a branch name in Git is all about. But there are several tricks:
Let's draw the latter case:
A--B--C <-- develop, main
(I've gotten deliberately lazy about drawing the arrows that lead backwards from one commit to the previous one here, mostly because the arrows in arrow-fonts don't always render very well on Stack Overflow, plus they are annoying to type in.)
We are going to be using commit C
, no matter which branch name we pick, but we need to pick a branch name. We need to mark that branch name as "the name we're using", too. To do that, we'll attach the special name HEAD
, written in all uppercase like this, to just one branch name:
A--B--C <-- develop, main (HEAD)
We're now using the name main
to find commit C
.
If we run git switch develop
or git checkout develop
, the picture changes slightly:
A--B--C <-- develop (HEAD), main
We're still using commit C
, but now we're using it *through the name develop
.
Why does this matter? Well, let's make a new commit now. Without worrying about how we make new commits, we note that when we do make a new commit, Git does the following:
user.name
and user.email
settings to put in as the name and email address of the author of the new commit.D
here.D
's hash ID to the current branch name.Since commit D
's parent is the current commit C
in step 3, new commit D
points backwards to existing commit C
. But step 4 updates the name develop
, because that's the name HEAD
is attached to:
A--B--C <-- main
\
D <-- develop (HEAD)
HEAD
thus provides two things:
Git can answer either question about HEAD
: "what name does it hold" or "what hash ID does it represent". Git just needs to ask itself the right question.
In an empty repository, there are no commits, and therefore there are no branch names. But if we "attach" HEAD
to some non-existent name—in reality, in Git, that means writing the branch name to the file .git/HEAD
(but with some complicating exceptions that we won't worry about here)—then running git commit
can still do the four steps we showed:
The last step—writing the hash ID—gives us:
A <-- main (HEAD)
as long as the name stored "in" HEAD
was main
. (The metadata for commit A
omits any parent hash IDs because Git notices, when reading HEAD
, that the name main
does not exist: this makes main
an unborn branch, as Git usually calls it, or an orphan branch, as Git sometimes, inconsistently, calls it.)
If you run git init
such that it creates a new, empty repository, there are no branch names yet. But you're "on" some branch anyway. The branch name you're "on" now determines the name of the branch you will create when you make the first commit.
The default name for a new, empty repository is master
. In very recent version of Git, you can use -b main
as arguments to git init
to change this to main
. You can also configure a setting, init.defaultBranch
, to override the default master
.
If you run git init
such that it simply re-initializes an existing repository, you are on some branch now, and you're still on that branch. The init.defaultBranch
setting, if it exists and would be used, isn't used here. You're just on whatever branch you're on: use git status
or git symbolic-ref HEAD
to find out what that is.
If you're in an empty repository right now, the branch you're "on" does not exist. So you can't rename it, at least in older versions of Git: git branch -m main
may not work. (It might work, as this has been fixed up in new versions of Git: the branch rename code notices that you're on an unborn branch, and renames the unborn branch, which is simple enough to do internally, it's just that nobody thought to do it before.)
There are two procedures that always work, regardless of your Git vintage:
git checkout --orphan main
will switch the name of the non-existent branch that you are on to main
. Your next commit will now create main
.master
, then rename master
to main
with git branch -m main
.These two procedures are of course only to be used in that empty-repository case. In a non-empty repository, you're still on whatever branch you were on before you ran git init
unnecessarily. Your git init
didn't do anything (so why did you bother?).
You mention in step 2:
- Created a repo in Github e.g.
my_repo
When you create a new repository with GitHub's web interface (or, presumably, with their gh
command line tool or CLI), you have two options:
Create a totally empty repository. This is like using git init
.
Create a repository with one initial commit that has some files in it: a README, a LICENSE, a COPYRIGHT, maybe even a .gitignore
, and that sort of thing. This is like using git init
and then making a commit.
The difference between these is that in the second case, the GitHub repository now has one commit in it, which means it can have branch names. In the first case, the GitHub repository has no commits, which means it can't have any branch names.
GitHub have, of course, changed their own default initial branch name to main
instead of master
. This affects both the unborn-branch empty-repository case and the one-commit now-the-repository-isn't-empty-so-there's-a-branch case.
Unless you have some powerful need for their initial commit, I'd recommend you use the "create with no commits" option. This makes your remaining tasks trivial.
There's another, more complex case, where your GitHub repository has a bunch of commits that you'd like to keep. I'll touch on this one briefly for completeness. Here, you will make use of the fact that Git repositories share commits, but not branch names.
Whenever we connect two Gits to each other with git fetch
or git push
, we're invoking the commit-sharing stuff. But there are two important differences between fetch and push here:
git fetch
means get me stuff from them. That is, I have my Git software, working on my repository ("my Git" collectively) call up their Git software and connect to their repository ("their Git"). Then my Git has their Git list out all their branch and tag names and the corresponding commit or other hash IDs. My Git then checks: for each hash ID, do I have that object? If not, my Git gets their objects from their Git, collecting all the commits they have that I don't in the process.
So now I share all their commits. (I might have some of my own that they don't, though.) Now that I have all their commits, my Git can set remote-tracking names in my repository, to remember their branch names. My Git will create or update my origin/main
and origin/develop
to remember the hash IDs they have stored in their main
and their develop
.
The end result of this is that after the fetch, I have all their commits, but none of my branches have been touched. My branch names are mine, not theirs. If I have a main
and a fred
, their main
doesn't affect my main
, and if they have a fred
it doesn't affect mine either. If they have a barney
or flintstone
and I don't, I still don't have that branch. I share their commits, not their branches.
The git push
command sends from my Git to their Git. I pick something—usually a branch name, but what I really need here is a commit hash ID—in my repository. I have my Git call up their Git and ask them if they have that hash ID. If not, my Git sends over that commit, and any previous commits that might be required that they don't already have, until they have all the commits leading up to that last commit that I just sent them too. (That way their git log
can find all the commits, for instance.)
But then, having sent them some of my commits—as many as I chose, but usually I choose by branch name so that I send them all the commits up to the last commit on some branch of mine—then something different happens. I don't have them set some sort of remote-tracking name. Instead, I ask them if they would please set one of their branch names to remember that commit hash ID.
They can refuse to do this! It's their choice as to whether they will let me set one of their branch names. But if they allow it, I've just created or updated a branch name in their Git. I don't have to use the same name on both sides. I don't even have to use a name on my side at all:
git push origin mybranch:free-the-ocean
for instance, or:
git push a123456:refs/heads/newbranch
if a123456
is a valid shortened hash ID for a commit in my repository.
So we share commits. We do not share branch names, except to the extent that if I say git push origin mybranch
, I mean git push origin mybranch:mybranch
: I'm going to ask them to use the same branch name I'm using. It's still their branch name, though. It's not mine to control: I can only ask them to create or update theirs.
(GitHub in particular have "protected branches", which won't let people update them with git push
. Of course, if I own the GitHub repository, I can use the GitHub web administration pages to de-protect the name and push to it, and then re-protect it. But you can see from this complicated process that I still wind up having to ask permission. It's just that I'm asking myself for permission.)
origin/HEAD
anyway?I mentioned:
git remote set-head origin --auto
up in the TL;DR section. What this does is:
origin
);HEAD
is attached to; andorigin/HEAD
name—its full name is refs/remotes/origin/HEAD
—so that it is a symbolic ref to the remote-tracking name in my repository that corresponds to the same branch name in their repository.Whew, that's a mouthful—or head-full—of concepts. Let's take it apart a bit:
main
and develop
for instanceorigin/main
and origin/develop
git fetch
and git push
.That is, I run git fetch origin
: my Git connects to their Git and gets from them a list of all their branch names and the commit hash IDs that go with those names. My Git makes sure I share all those commits, too. Now that I have all their commits, I can create or update remote-tracking names for each of their branch names. My Git will do this automatically for me.
If their HEAD
holds one of their branch names—and it does—then my origin/HEAD
could hold one of my remote-tracking names. The git fetch
command does not maintain this automatically.3 Running git remote set-head
lets you have your Git update it; see the git remote
documentation for details.
The name stored in their HEAD
is the name their Git will recommend at git clone
time. That is, when you run:
git clone <url>
your git clone
operation will:
git init
to create a new, empty repository;git remote add origin url
to create origin
;git config
operations here;git fetch
;git checkout -b somebranch --track origin/somebranch
.The branch name in step 6 that your Git creates, based on your origin/
name that's based on one of their branch names, is the name you supplied to the -b
option to git clone
. If you didn't supply a -b
option, your Git asks their Git which name they recommend. They recommend whatever name is in their HEAD
. So that's what their HEAD
is really for: to act as a default recommendation to git clone
operations.
Note that there's no Git protocol for setting HEAD
in someone else's repository. GitHub offer a configuration page where you can set your own repositories' HEAD
s. (At least one cloud provider—Google—seem to have failed to set up such a page, which leaves their HEAD
s set to master
even if they have a main
instead.)
3I don't really know why. It seems like it should. It doesn't: you need to run git remote
instead. But Git explicitly lets you set origin/HEAD
to any of your own remote-tracking names, regardless of which one they have theirs set to. This is theoretically useful—see the gitrevisions documentation and note step 6 of the name-resolving protocol—but I've never found it useful myself.
Note that git fetch
does not clean up automatically: if they had a rumplestiltskin
branch yesterday, and your Git made origin/rumplestiltskin
to match, and today their rumplestiltskin
is gone, your left-over origin/rumplestiltskin
sits there gathering dust. You can run git fetch --prune
or git remote prune origin
to fix these up, or you can set fetch.prune
to true
in your configuration to make git fetch
act like git fetch --prune
by default.
Upvotes: 2