Virat Chauhan
Virat Chauhan

Reputation: 33

Cannot push my project to an existing Github repository: "Updates were rejected because the remote contains work that you do not have locally"

After reading the Github documentation, I wanted to push all of my previously done local projects to a pre-existing repository.

For a folder like this:

project_abc
  - basic
  - extended

I wanted to push it to a repository named "practice" such that project_abc lies inside it.

But I stumbled across errors on my very first such folder.

On Attempting to push, I get this

 ! [rejected]        main -> main (fetch first)
   error: failed to push some refs to 'github.com:Githubuser/practice.git'
    hint: Updates were rejected because the remote contains work that you do
    hint: not have locally. This is usually caused by another repository pushing
    hint: to the same ref. You may want to first integrate the remote changes
    hint: (e.g., 'git pull ...') before pushing again.
    hint: See the 'Note about fast-forwards' in 'git push --help' for details.

When pulling in hopes to fix the issue

 fatal: refusing to merge unrelated histories

Upvotes: 1

Views: 3731

Answers (1)

torek
torek

Reputation: 487755

Your basic error is that you are conflating project and repository here.

Some people do like to put multiple (albeit usually related ) projects into a single repository. The jargon term for this is monorepo: a monorepo stores all those projects in one Git repository (in this, Git's, case; but other version control systems often work very similarly), which makes many things more convenient, and some other things less convenient. The linked Wikipedia page has a decent overview.

To put multiple projects into a monorepo, you make the one repository—i.e., run git init once—and then organize everything into that one repository. You can then clone that repository to make your distributed copies of that repository, using git clone and git fetch and git push as appropriate (see below).

To put multiple projects into multiple repositories, make each repository separately with an appropriate git init, organize the one project within that repository, and so on. The process is the same as for the monorepo setup, except that now you have multiple repositories. The obvious term for this would be polyrepo, but that appears not to be in common use. At least, there's no Wikipedia page. (The term does turn up in google search, and I will use it here.)

What to know before you start

Git is a distributed version control system, or DVCS. A version control system holds versions (of files). These come in many flavors, and lately the usual distinction is distributed versus centralized. A centralized VCS has an authoritative "this is the correct one" location: there might be additional copies, but only the central VCS has the "real" copies. A distributed VCS lacks this central control point: each DVCS is "the real copy" even if they're all slightly different.

As with monorepo vs polyrepo, CVCS vs DVCS offers advantages and disadvantages. We won't try to cover those here since you've already chosen the DVCS.

Git in particular stores versions as commits. Commits are Git's raison d'être and a repository is mainly just a big database of commits. However, each commit in a Git repository is found by a very large, random-looking number, expressed in hexadecimal as a big ugly hash ID. These are impossible for humans to remember and to deal with, so Git provides a second database of names: branch names, tag names, remote-tracking names, and all kinds of other names. A Git repository will use its secondary names database to remember the raw hash IDs for you, so that you can use human-oriented names to extract your versions.

A Git repository is therefore really two databases:

  • The big database contains commits, and other supporting Git objects. These are numbered (with big ugly hash IDs); Git needs the hash IDs in order to use this database.

  • The smaller (usually much smaller) database contains names, which you (a human) will use to have Git find the hash IDs so that Git can find the commits in the (usually much bigger) objects database.

Each commit's unique number lets Git extract that one particular commit later. Every commit stores two things:

  • Each commit has a full snapshot of every file, frozen in time in the form it had when you (or whoever) made the snapshot. To save space, these are stored in a special, read-only, Git-only, compressed and de-duplicated format. Only Git can read them, and literally nothing, not even Git itself, can overwrite them. But this makes them almost useless, except as we'll see below.

  • Each commit also has some metadata, or information about that particular commit. This includes stuff like the name and email address of whoever made the commit, and some date-and-time stamps. This also includes a list of previous commit hash IDs, usually with exactly one entry in the list.

To keep the answer short, we won't go into any more detail here except to say this: the files in the Git objects database are quite useless to anything but Git itself. In order for you to get any actual work done, you must have Git extract a commit. To do that, you use git switch or the older git checkout command to check out one specific commit (usually the tip commit of a branch; the word branch is badly abused in Git and has multiple meanings, but with some practice you'll soon be flinging that word around as carelessly and automatically as everyone else).

Anyway, when you check out a branch, Git will:

  • remove, from your work area, any files that are there because of some previous checkout (if this is your first checkout of any commit, there are no such files); then
  • install, into your work area, all the files from the commit you selected.

This gives you a work area—which Git calls your working tree or work-tree—in which you have real, actual, ordinary computer files, not some special frozen-for-all-time Git-ized useless compressed things. But these files are not in Git. They came out of the commit you just checked out, but they are not in Git. In other words, when you work in a Git repository, you work on files that are not in Git at all. That's why it's important to commit often: only the committed files go into Git, and it can only get you back stuff you commit. Everything else is not in Git and Git can't help you with it.

Review

  • Git stores commits and other Git objects in one of its two main databases. These have big ugly random-looking (but not actually random) hash IDs.

  • Git stores names, like branch names, in the other of its two main databases. These let Git find the hash IDs so that you can extract commits.

  • When you check out or switch to some branch name, you're really checking out one particular commit: the tip commit of that branch. This gets you ordinary files to work on. Those files aren't in Git.

  • When you use git add and git commit to make a new commit, Git makes the new commit, saving the files (and your name etc) within the database. The database mainly only ever gets added to, so once saved in a commit, you can almost always get all of this stuff back later. (The weasel wording here is because there are several ways to "lose" a commit, though it's deliberately hard to do except by nuking the entire repository, or losing the computer or its storage—and those are things Git can't help you with afterwards!)

Creating a new repository, and where the working tree is

When you run git init and it creates a new repository, it tells you this:

$ git init
Initialized empty Git repository in ...

If you already have a repository, git init does nothing (well, almost nothing) and says instead:

$ git init
Reinitialized existing Git repository in ...

In both cases it tells you where the repository directory went (for "directory", read "folder" if you prefer that term). If I ran git init in a new empty /tmp/dir, for instance, I would get Initialized empty Git repository in /tmp/dir/.git. That .git directory holds the actual repository. Removing the /.git part leaves us with the path to the working tree, in this case /tmp/dir.

Remember that you (and Git) will put ordinary files into your working tree. You'll then use git add and git commit to make Git copy the working tree version into a new commit, which will go into the database that's somewhere under the .git directory.

At all times, the working tree files are yours to play with however you like. You're supposed to keep your hands off all the Git files inside the .git directory, though—and remember, some Git commands, like git switch or git reset --hard, are directives to Git telling it to mess with your working tree files. The computer generally will do whatever you tell it to do, even if you didn't quite mean what you said, so be at least a little careful about what you tell the computer to do.

The distributed part: git clone, git fetch, git push

Once you have a repository, you can let others copy it. Or, if you don't have a repository here (on "this computer", whatever computer "this computer" is, or in "this directory" / "this folder", wherever that is), but someone else does, you can copy it. The usual way to copy an entire repository is to use git clone.

What git clone really means, though, is:

  1. create an empty directory;
  2. inside that empty directory, run git init;
  3. now that there's a Git repository there, set it up and use git fetch;
  4. then create and check out some branch.

Step 1 uses whatever command your computer uses. Step 2 uses git init, and in the early days of Git, git clone literally ran git init (though now it's all fancied-up and mixed together to make it more efficient / faster). The third step literally ran git fetch, and still works the same way.

What git fetch does is to call up some other Git software that's hooked up to some other Git repository. The two Git software programs speak with each other. They use the hash IDs to figure out which commits the sending Git has, that the receiving Git lacks. When we're using git clone, we've just made a totally-empty repository—one with no commits at all—and so we lack every commit, and therefore we download every commit from the other repository.

In other words, this git clone step copies their commit objects database. A later git fetch hooks up with them again, and gets new commits, but doesn't bother re-copying existing commits. All commits are frozen for all time, and their hash IDs are unique to those particular commits—no other commit anywhere, in any Git repository, is allowed to use that ID ever again—so just checking the IDs suffices, and your git fetch will efficiently add their new commits to your repository, without affecting any new commits you made, which have their own unique IDs.

(Now you know why the IDs are so big and ugly. They have to be, or your software might accidentally re-use an ID.)

Note that git clone does not copy their names database. Instead, they list out their names, and our Git software modifies those names:

  • For each tag name they list, we might copy the tag as-is.
  • But for each branch name they list, our Git makes a remote-tracking name.

These remote-tracking names—Git calls them remote-tracking branch names—let our Git software remember, in our Git repository, their Git repository's branch names, without using up any of our own branch names. If they have branches named main and develop, for instance, our Git will create names origin/main and origin/develop. That keeps the names main and develop and so on available for us to use for our commits.

So, a git fetch, whether it's the initial one into a new empty repository made by git clone, or one into a mostly-full repository, gets any of their new commits and copies those into our Git objects database, and then finds their current branch name commit hash IDs and updates our remote-tracking names. We now have all of their commits, plus any new ones we've made, and we have memories for all their branch names.

Our last step of git clone is that our Git will make, in our repository, one branch name. Git doesn't actually need any branch names to get stuff done: Git uses the hash IDs. But we, humans, need branch names. So our Git takes the branch name we told it to use when we ran git clone, and makes a branch with that name.

But hold on: we probably ran git clone https://github.com/blah/repo.git, or something like that. We didn't tell it any branch name! Well, in that case, our Git software asks the GitHub Git software: What branch name do you recommend? They'll probably say main, and that's the name our Git will create. If we want something else, we can use, e.g., git clone -b develop url, to tell our Git to make the name develop.

In any case, whatever name we pick—or have them pick—our Git will check our repository for a remote-tracking name that resembles this name. So our Git will look for origin/main, or origin/develop, or whatever. On finding that name, our Git will create the name—main or develop or whatever—and store in it the same hash ID that their Git is using for their branch name.

The end result of all this is that we have a different branch name that's merely spelled the same. Their main is our origin/main; our main is our own main. These are two different branch names, even though they're both main! And our remote-tracking name origin/main, which remembers their branch name main, is not a branch name.

(See what I mean about Git beating the word branch to death? What exactly do we mean by "branch"?)

Once we've made our own commits, we may want to send those commits back to GitHub (or wherever it is that we cloned from). To do that, we'll use git push. The git push command is a lot like git fetch, but different:

  • git push needs the name of the remote, origin.
  • After that, git push needs the name of our branch.

So we'd run, say, git push origin develop or git push origin main, depending on what name we're using.

What this does is have our Git software call up their Git software—just like for git fetch—but turn the direction around. This time, instead of getting new commits they have that we lack, we'll send them our new commits that we have that they lack. So our Git will check their main or develop or whatever and see that we made one or two new commits, and will send over our new commits.

Here, again, push and fetch differ. Having sent over our new commits, our Git software now asks them, politely, if they would please set their branch of the same name to hold the new commit hash ID. They will agree if and only if this merely adds new commits to their repository. We won't go into all the details here, but if you're working with other people, on a shared hosted repository, and multiple people git push to the same name on that shared hosted repository, sometimes your git push would want to drop their commits. In that case, their Git will say no, I won't do that, if I took your commits I'd have to drop Fred's and Susan's or whatever.

If your repositories are all private (not shared like this), this sort of thing is pretty rare. You may still encounter it, depending on how you use Git! But it won't happen unless you make it happen yourself. So, again, we won't go into any of the details.

Review of part 2

  • To copy a repository from somewhere else, use git clone. This makes a new repository, adds a remote origin, fetches with git fetch, and creates a branch name and then does a checkout / switch.

  • To send your commits to another repository, use git push. That other repository needs to exist already, and you must be adding commits to it. If you use GitHub to create a repository, be sure to create an empty repository, not one with some commits already in it, unless you're going to git clone that repository first, and only then add new commits.

  • To get new commits from another repository that you've already cloned, use git fetch.

  • Fetch is the opposite of push, but they're not true opposites: "fetch" can fetch all branches at once and is always safe because of remote-tracking names, but "push" pushes one branch at a time and doesn't have anything like remote-tracking names.

  • Beware of tutorials that talk about git pull here. Pull means run git fetch, then run a second Git command. That second command, which is normally git merge but you can make it be git rebase, is complicated! Pull is not the opposite of push at all. Fetch is as close as you get to the opposite here.

Practice

Always start using Git with a practice repository that you don't care about too much. 😀 You will stumble into oddities. You will sometimes want to just destroy the practice repository and start over.

Upvotes: 1

Related Questions