Huy Phạm
Huy Phạm

Reputation: 1023

(Git) Push a previous commit to a new repository without losing current changes

I've been developing an application. As the application scales up, it's common sense to create my own code base to re-use in the future, so I don't have to do everything all over again. I've created a completely new repo named MyCodeBase. Now I want to push a specific previous commit, to this new branch, without losing any changes (unstaged files & unpushed commit) I've already made to the current repository. Is this task possible? What I've tried:

 git push MyCodeBase <commit_SHA>:HEAD:main 

(error: src refspec <commit_SHA>:HEAD does not match any)

git push MyCodeBase HEAD <commit_SHA>:main 

(error: The destination you provided is not a full refname) We tried to guess what you meant by:

  • Looking for a ref that matches 'main' on the remote side.
  • Checking if the being pushed ('<commit_SHA>') is a ref in "refs/{heads,tags}/". If so we add a corresponding
    refs/{heads,tags}/ prefix on the remote side. Neither worked, so we gave up. You must fully qualify the ref. hint: The part of the refspec is a commit object. hint: Did you mean to create a new branch by pushing to <commit_SHA>:refs/heads/main
git push MyCodeBase <commit_SHA>:main

error: The destination you provided is not a full refname (i.e., starting with "refs/"). We tried to guess what you meant by:

  • Looking for a ref that matches 'main' on the remote side.
  • Checking if the being pushed ('<commit_SHA>') is a ref in "refs/{heads,tags}/". If so we add a corresponding
    refs/{heads,tags}/ prefix on the remote side.

Neither worked, so we gave up. You must fully qualify the ref. hint: The part of the refspec is a commit object. hint: Did you mean to create a new branch by pushing to hint: '<commit_SHA>:refs/heads/main'? error: failed to push some refs to 'https://github.com/<my_user_name>/MyCodeBase.git'

Summary:

Upvotes: 1

Views: 1065

Answers (1)

torek
torek

Reputation: 487745

TL;DR

What you want is the git push repository refspec syntax, with the refspec being something along the lines of:

a123456:refs/heads/main

Be sure you know precisely what this does (i.e., read the long section).

Long

First, before you get too deeply into the details, remember that the definition of a Git repository is—more or less1—a collection of commits. Git isn't really about changes, or files, or branches, or any of many of the things that we do with it, at the root level: at that level, it's about commits.

This mans you need to know precisely what a commit is and does for you, and that boils down to just a few things:

  • Each commit stores a full snapshot of some set of files. These aren't changes, these are snapshots. They're analogous to a tarball or zip file. If you were to download and unpack an archive like this, you would not think of it as "changes". You could download two archives and compare them to find changes, of course. In the same way, you should not think of a Git commit as changes—but if you have two commits and compare them, you can find changes.

  • Meanwhile, each commit also stores some metadata: some information about the commit itself. This includes a name and email address, and a date-and-time stamp, for the person who made the commit. (In fact, it has two of these sets of data: one for the author, and one for the committer.) It has an arbitrary log message that whoever makes the commit gets to write, to tell people later why that particular commit exists. And—crucially for Git—each commit has some number of parent commit hash IDs stored inside it.

  • Each commit has a unique hash ID. When I say unique, I don't mean a little unique, either. I don't even mean "very unique", which is in some ways a misuse of the word. Git has, as a sort of goal, the idea of giving every commit anyone ever makes anywhere a hash ID that is peculiar to that one commit.2

  • No part of any commit can ever be changed, once it's made and has acquired its unique hash ID.

This unique hash ID, and the unchanging nature of a commit, means that one Git can always tell whether some other Git already has the same commit, or not, just by having the two Gits exchange hash IDs. They need not exchange the entire set of files, or even some subset of the files. The hash ID alone tells the entire story.

In this way, the hash ID is the commit, in a sense. You either have the hash ID, in which case you have the commit, or you don't, in which case you find some Git—any Git anywhere—that does have that hash ID and get the commit from them.

What git push and git fetch are about, then, is making sure that the receiving Git—for git push, that's the "other" Git; for git fetch, that's your Git—has some or all of the commits that the sending Git wishes to send.


1I will touch on the "more or less" part as well in a moment.

2Git is allowed to fail in this goal as long as the two Git repositories in which some non-unique hash ID occurs never actually meet. But rather than try to guess which repositories might exchange data with each other, and which never will, Git tries to just make each commit hash ID universally unique.


Git doesn't "like" stand-alone commits

The "Git philosophy", as it were, is that you always have, at all times, all of the history of a repository. But what exactly is the history of a repository?

If we look at the definition of a commit again—an archive plus metadata, with the metadata including the raw hash ID of the parent or parents of a commit—we can pretty quickly draw a picture of what this could mean:

first-commit  <-... <-commit  <-commit  <-commit  <-... <-last-commit

Each of the "arrows" coming out of a commit, here, is really the hash ID of the earlier commit. We say that the later commit points to the earlier commit.

The actual hash IDs are random-looking, and also very large and ugly and impossible for humans to remember,3 so for drawing purposes, I like to use uppercase letters to stand in for the hash IDs:

A <-B <-C   <--main

This is a drawing of a small repository with just three commits in it. It also has only one branch name, main. The name main serves to let Git know which of the three commits is the last one.

Obviously, in a tiny repository like this, we could just look at all three commits. One—commit A—doesn't point back at all: it's the very first commit, and it can't. One points to A, and that must be the second commit that we're calling B, and the last one points to B, so that must be the last commit of the three. But in a really big repository, there may be many thousands of commits. Finding the "last" one would take too long, and has other drawbacks. So Git adds branch names, and other names such as tag names, to the mix. This is what makes a repository more or less a collection of commits: It's really a collection of commits and some names by which we find some particular commits.

Branch names in particular find the last commit of a branch. This is also how we add a commit to a repository. If we're on main, and have:

A--B--C   <-- main

and we add a new commit, the new commit gets some random-looking, big ugly unique hash ID, which we'll call D. Inside D, the metadata includes the hash ID of the current commit C (which we, or Git, found by the name main). So new commit D points back to existing commit C. Now, in order to make commit D the last commit of this branch, Git simply writes D's hash ID, whatever that is, into the name main:

A--B--C--D   <-- main

and now we have more commits on our branch.

If we instead add a new branch, we might start out with this:

A--B--C   <-- main, develop

If we're on branch develop, as git status would say, and we run git commit and make a new commit D, Git makes the new commit in the same way as always, but this time the branch name that Git writes into is develop instead of main, producing:

A--B--C   <-- main
       \
        D   <-- develop

Note that, for Git to do anything useful with commit C, like show what changed in it, Git needs commit B as well. To make commit D, Git needs commit C first. In general, Git wants and needs every historical commit starting from the end-points—whose hash IDs are in the various branch names—and working backwards to the very first commit.

What this means in general is that in most Git repositories,4 you have every commit leading up to the last ones. These commits are the history, in a Git repository. There is no such thing as "file history": each commit has a full snapshot of every file, as a sort of archive. The history is the set of commits, which Git finds by starting at the ends—from the branch names—and working backwards.


3This is all necessary in order for them to be universally unique.

4Git supports so-called shallow repositories, where the history cuts off at some point, but in general, you do not want to use these.


What this means for your git push

When you run git push, you are telling your Git to send some particular commit to some other Git repository. The syntax for this is:5

git push <repository> <refspec>

The <repository> part here can be a URL, or the name of a remote such as origin. Using a name adds various convenience features. Git will find the URL using that name, for instance, which avoids having to type the same long and error-prone URL repeatedly.6

The real magic is in the <refspec> part. A refspec can be:

  • a branch name, by itself; or
  • a raw commit hash ID, followed by a colon, followed by a reference name; or
  • the word HEAD, followed by a colon, followed by a reference name;

or several more options, most of which I won't go into here. You're trying to use the middle or last option, and when using these last two options, the name often has to be a fully qualified reference name. We'll get back to this in a moment, but before we do, let's look at what git push will do:

  • Given the hash ID, or the name HEAD, your Git will look up the corresponding commit.
  • Your Git will then offer that hash ID to the other Git. If they already have that commit, they will say, to your Git: No thanks, I already have that one. This has some implications.
  • If they don't have that one, your Git must now offer that commit's parent commit hash IDs. Most ordinary commits have just the one hash ID; merge commits have two or more; and root commits have none. Whatever the commit's type is, your Git is obligated to offer all the parents.

This repeats until they finally say that they do have the hash IDs, or your Git runs out of commits to offer because you've offered every historical commit leading up to and including the one you're pushing.

This allows your Git to know which files of yours their Git already has. That's the implication I mentioned above. If you offer commit C, and they don't have that, but then you offer commit B and they do have that, this tells your Git that they have commits B and A both, and therefore they have all the files that exist in commits B and A already. So your Git can now compress your commit C, knowing that they have commits A-B already and thus referring to the files in those existing commits.

Of course, if they don't have those commits—for instance, if this is a new, totally-empty repository—your Git will have to send every commit leading up to the final one you're sending.

Once this is all done, your Git now asks their Git to set one of their names. Let's finish off this section now, and describe reference properly.


5There is more than one syntax; this is the general one that you'll need here.

6Using a remote name adds more convenience features than just this URL-shortening one, but that's all I will cover here.


Reference names

I mentioned above that Git uses each branch name to store the hash ID of the last commit we want to say is "on the branch", but also that Git has more than one kind of name. The other kinds of names include tag names, remote-tracking names, the "stash" that handles git stash, and many more. The three kinds of names that you, as a user, normally deal with are branch names, tag names, and remote-tracking names like origin/main.

Each of these names lives in a namespace. Branch names, in particular, live under refs/heads/, while tag names live under refs/tags/. This means that the branch main is really the name refs/heads/main. The tag v1.2 is really refs/tags/v1.2.

Most of the time, when you run git push, you are asking your Git to send commits from one or more of your branches to some other Git, and when you do that, you want them to set one of their branches to remember the same last commit. When you're doing that with, e.g.:

git push origin main

or:

git push origin develop

you generally want them to set their branch of the same name. So here, Git lets you leave out the refs/heads/ part and the colon and the entire other part. Your Git figures out that main really means refs/heads/main:refs/heads/main. That is, you want your branch name, main, to determine the last commit you'll send, and you then want your Git to ask their Git to set their branch name main, too.

You, however, want to use a raw hash ID. Your Git doesn't know, then, whether it should ask their Git to set a tag name, or a branch name, or some other kind of name entirely. What you need to do is use a fully-qualified name:

git push <url-or-remote> <hash>:refs/heads/somebranch

This will ask their Git to create or update a branch name, somebranch, in their Git repository, to remember as its last commit, the commit whose hash ID you used in the git push line. It will have the side effect of sending that commit and all of its history if necessary.

You literally cannot push uncommitted changes

Note that when you run git push, what your Git sends are commits. It sends the commits—the snapshots with metadata—to the other Git, which then stores them in a quarantine area for a moment. Your Git does not send changes, but rather whole commits.7 Your Git then asks their Git to create or update some reference name—branch name, tag name, or whatever other kind of name you like—so as to remember the specific commit you named in your push.

If you have uncommitted code, this stuff is not in Git. Your Git literally can't send it yet. To send it, your Git would have to commit it first.8 But you will be sending the entire history of commits up to this point to your other repository. If that's what you want—often, it is—you're all good: go for it.


7Your Git does use compression, which may turn the whole commit into changes across the push operation—but these changes, if any get constructed, depend on what commits their Git has, that your Git knows about. Their Git may need to re-expand these and then re-compress them later, depending on many factors.

8It's possible to make temporary commits that are not on any branch, of course; your Git could then send those. Such temporary-commits-not-on-any-branch are how git stash works, for instance. But Git does not do this today, and the receiving Git would need to use some name to remember them, which means we're right back to the whole refspec issue.

Upvotes: 5

Related Questions