Reputation: 33
After reading the Github documentation, I wanted to push all of my previously done local projects to a pre-existing repository.
For a folder like this:
project_abc
- basic
- extended
I wanted to push it to a repository named "practice" such that project_abc lies inside it.
But I stumbled across errors on my very first such folder.
On Attempting to push, I get this
! [rejected] main -> main (fetch first)
error: failed to push some refs to 'github.com:Githubuser/practice.git'
hint: Updates were rejected because the remote contains work that you do
hint: not have locally. This is usually caused by another repository pushing
hint: to the same ref. You may want to first integrate the remote changes
hint: (e.g., 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
When pulling in hopes to fix the issue
fatal: refusing to merge unrelated histories
Upvotes: 1
Views: 3731
Reputation: 487755
Your basic error is that you are conflating project and repository here.
Some people do like to put multiple (albeit usually related ) projects into a single repository. The jargon term for this is monorepo: a monorepo stores all those projects in one Git repository (in this, Git's, case; but other version control systems often work very similarly), which makes many things more convenient, and some other things less convenient. The linked Wikipedia page has a decent overview.
To put multiple projects into a monorepo, you make the one repository—i.e., run git init
once—and then organize everything into that one repository. You can then clone that repository to make your distributed copies of that repository, using git clone
and git fetch
and git push
as appropriate (see below).
To put multiple projects into multiple repositories, make each repository separately with an appropriate git init
, organize the one project within that repository, and so on. The process is the same as for the monorepo setup, except that now you have multiple repositories. The obvious term for this would be polyrepo, but that appears not to be in common use. At least, there's no Wikipedia page. (The term does turn up in google search, and I will use it here.)
Git is a distributed version control system, or DVCS. A version control system holds versions (of files). These come in many flavors, and lately the usual distinction is distributed versus centralized. A centralized VCS has an authoritative "this is the correct one" location: there might be additional copies, but only the central VCS has the "real" copies. A distributed VCS lacks this central control point: each DVCS is "the real copy" even if they're all slightly different.
As with monorepo vs polyrepo, CVCS vs DVCS offers advantages and disadvantages. We won't try to cover those here since you've already chosen the DVCS.
Git in particular stores versions as commits. Commits are Git's raison d'être and a repository is mainly just a big database of commits. However, each commit in a Git repository is found by a very large, random-looking number, expressed in hexadecimal as a big ugly hash ID. These are impossible for humans to remember and to deal with, so Git provides a second database of names: branch names, tag names, remote-tracking names, and all kinds of other names. A Git repository will use its secondary names database to remember the raw hash IDs for you, so that you can use human-oriented names to extract your versions.
A Git repository is therefore really two databases:
The big database contains commits, and other supporting Git objects. These are numbered (with big ugly hash IDs); Git needs the hash IDs in order to use this database.
The smaller (usually much smaller) database contains names, which you (a human) will use to have Git find the hash IDs so that Git can find the commits in the (usually much bigger) objects database.
Each commit's unique number lets Git extract that one particular commit later. Every commit stores two things:
Each commit has a full snapshot of every file, frozen in time in the form it had when you (or whoever) made the snapshot. To save space, these are stored in a special, read-only, Git-only, compressed and de-duplicated format. Only Git can read them, and literally nothing, not even Git itself, can overwrite them. But this makes them almost useless, except as we'll see below.
Each commit also has some metadata, or information about that particular commit. This includes stuff like the name and email address of whoever made the commit, and some date-and-time stamps. This also includes a list of previous commit hash IDs, usually with exactly one entry in the list.
To keep the answer short, we won't go into any more detail here except to say this: the files in the Git objects database are quite useless to anything but Git itself. In order for you to get any actual work done, you must have Git extract a commit. To do that, you use git switch
or the older git checkout
command to check out one specific commit (usually the tip commit of a branch; the word branch is badly abused in Git and has multiple meanings, but with some practice you'll soon be flinging that word around as carelessly and automatically as everyone else).
Anyway, when you check out a branch, Git will:
This gives you a work area—which Git calls your working tree or work-tree—in which you have real, actual, ordinary computer files, not some special frozen-for-all-time Git-ized useless compressed things. But these files are not in Git. They came out of the commit you just checked out, but they are not in Git. In other words, when you work in a Git repository, you work on files that are not in Git at all. That's why it's important to commit often: only the committed files go into Git, and it can only get you back stuff you commit. Everything else is not in Git and Git can't help you with it.
Git stores commits and other Git objects in one of its two main databases. These have big ugly random-looking (but not actually random) hash IDs.
Git stores names, like branch names, in the other of its two main databases. These let Git find the hash IDs so that you can extract commits.
When you check out or switch to some branch name, you're really checking out one particular commit: the tip commit of that branch. This gets you ordinary files to work on. Those files aren't in Git.
When you use git add
and git commit
to make a new commit, Git makes the new commit, saving the files (and your name etc) within the database. The database mainly only ever gets added to, so once saved in a commit, you can almost always get all of this stuff back later. (The weasel wording here is because there are several ways to "lose" a commit, though it's deliberately hard to do except by nuking the entire repository, or losing the computer or its storage—and those are things Git can't help you with afterwards!)
When you run git init
and it creates a new repository, it tells you this:
$ git init
Initialized empty Git repository in ...
If you already have a repository, git init
does nothing (well, almost nothing) and says instead:
$ git init
Reinitialized existing Git repository in ...
In both cases it tells you where the repository directory went (for "directory", read "folder" if you prefer that term). If I ran git init
in a new empty /tmp/dir
, for instance, I would get Initialized empty Git repository in /tmp/dir/.git
. That .git
directory holds the actual repository. Removing the /.git
part leaves us with the path to the working tree, in this case /tmp/dir
.
Remember that you (and Git) will put ordinary files into your working tree. You'll then use git add
and git commit
to make Git copy the working tree version into a new commit, which will go into the database that's somewhere under the .git
directory.
At all times, the working tree files are yours to play with however you like. You're supposed to keep your hands off all the Git files inside the .git
directory, though—and remember, some Git commands, like git switch
or git reset --hard
, are directives to Git telling it to mess with your working tree files. The computer generally will do whatever you tell it to do, even if you didn't quite mean what you said, so be at least a little careful about what you tell the computer to do.
git clone
, git fetch
, git push
Once you have a repository, you can let others copy it. Or, if you don't have a repository here (on "this computer", whatever computer "this computer" is, or in "this directory" / "this folder", wherever that is), but someone else does, you can copy it. The usual way to copy an entire repository is to use git clone
.
What git clone
really means, though, is:
git init
;git fetch
;Step 1 uses whatever command your computer uses. Step 2 uses git init
, and in the early days of Git, git clone
literally ran git init
(though now it's all fancied-up and mixed together to make it more efficient / faster). The third step literally ran git fetch
, and still works the same way.
What git fetch
does is to call up some other Git software that's hooked up to some other Git repository. The two Git software programs speak with each other. They use the hash IDs to figure out which commits the sending Git has, that the receiving Git lacks. When we're using git clone
, we've just made a totally-empty repository—one with no commits at all—and so we lack every commit, and therefore we download every commit from the other repository.
In other words, this git clone
step copies their commit objects database. A later git fetch
hooks up with them again, and gets new commits, but doesn't bother re-copying existing commits. All commits are frozen for all time, and their hash IDs are unique to those particular commits—no other commit anywhere, in any Git repository, is allowed to use that ID ever again—so just checking the IDs suffices, and your git fetch
will efficiently add their new commits to your repository, without affecting any new commits you made, which have their own unique IDs.
(Now you know why the IDs are so big and ugly. They have to be, or your software might accidentally re-use an ID.)
Note that git clone
does not copy their names database. Instead, they list out their names, and our Git software modifies those names:
These remote-tracking names—Git calls them remote-tracking branch names—let our Git software remember, in our Git repository, their Git repository's branch names, without using up any of our own branch names. If they have branches named main
and develop
, for instance, our Git will create names origin/main
and origin/develop
. That keeps the names main
and develop
and so on available for us to use for our commits.
So, a git fetch
, whether it's the initial one into a new empty repository made by git clone
, or one into a mostly-full repository, gets any of their new commits and copies those into our Git objects database, and then finds their current branch name commit hash IDs and updates our remote-tracking names. We now have all of their commits, plus any new ones we've made, and we have memories for all their branch names.
Our last step of git clone
is that our Git will make, in our repository, one branch name. Git doesn't actually need any branch names to get stuff done: Git uses the hash IDs. But we, humans, need branch names. So our Git takes the branch name we told it to use when we ran git clone
, and makes a branch with that name.
But hold on: we probably ran git clone https://github.com/blah/repo.git
, or something like that. We didn't tell it any branch name! Well, in that case, our Git software asks the GitHub Git software: What branch name do you recommend? They'll probably say main
, and that's the name our Git will create. If we want something else, we can use, e.g., git clone -b develop url
, to tell our Git to make the name develop
.
In any case, whatever name we pick—or have them pick—our Git will check our repository for a remote-tracking name that resembles this name. So our Git will look for origin/main
, or origin/develop
, or whatever. On finding that name, our Git will create the name—main
or develop
or whatever—and store in it the same hash ID that their Git is using for their branch name.
The end result of all this is that we have a different branch name that's merely spelled the same. Their main
is our origin/main
; our main
is our own main
. These are two different branch names, even though they're both main
! And our remote-tracking name origin/main
, which remembers their branch name main
, is not a branch name.
(See what I mean about Git beating the word branch to death? What exactly do we mean by "branch"?)
Once we've made our own commits, we may want to send those commits back to GitHub (or wherever it is that we cloned from). To do that, we'll use git push
. The git push
command is a lot like git fetch
, but different:
git push
needs the name of the remote, origin
.git push
needs the name of our branch.So we'd run, say, git push origin develop
or git push origin main
, depending on what name we're using.
What this does is have our Git software call up their Git software—just like for git fetch
—but turn the direction around. This time, instead of getting new commits they have that we lack, we'll send them our new commits that we have that they lack. So our Git will check their main
or develop
or whatever and see that we made one or two new commits, and will send over our new commits.
Here, again, push
and fetch
differ. Having sent over our new commits, our Git software now asks them, politely, if they would please set their branch of the same name to hold the new commit hash ID. They will agree if and only if this merely adds new commits to their repository. We won't go into all the details here, but if you're working with other people, on a shared hosted repository, and multiple people git push
to the same name on that shared hosted repository, sometimes your git push
would want to drop their commits. In that case, their Git will say no, I won't do that, if I took your commits I'd have to drop Fred's and Susan's or whatever.
If your repositories are all private (not shared like this), this sort of thing is pretty rare. You may still encounter it, depending on how you use Git! But it won't happen unless you make it happen yourself. So, again, we won't go into any of the details.
To copy a repository from somewhere else, use git clone
. This makes a new repository, adds a remote origin
, fetches with git fetch
, and creates a branch name and then does a checkout / switch.
To send your commits to another repository, use git push
. That other repository needs to exist already, and you must be adding commits to it. If you use GitHub to create a repository, be sure to create an empty repository, not one with some commits already in it, unless you're going to git clone
that repository first, and only then add new commits.
To get new commits from another repository that you've already cloned, use git fetch
.
Fetch is the opposite of push, but they're not true opposites: "fetch" can fetch all branches at once and is always safe because of remote-tracking names, but "push" pushes one branch at a time and doesn't have anything like remote-tracking names.
Beware of tutorials that talk about git pull
here. Pull means run git fetch
, then run a second Git command. That second command, which is normally git merge
but you can make it be git rebase
, is complicated! Pull is not the opposite of push at all. Fetch is as close as you get to the opposite here.
Always start using Git with a practice repository that you don't care about too much. 😀 You will stumble into oddities. You will sometimes want to just destroy the practice repository and start over.
Upvotes: 1