Reputation: 92150
Imagine I have branch develop
and branch feature
On feature
there's a commit shaX
Someone merged develop
on feature
with --rebase
and pushed, thus the feature commit is shaY
now and history has been rewritten.
Currently, I only have shaY
. But I'd like to recreate the original feature
branch from shaX
.
How can I get shaX
when I only know shaY
?
I can (if required) accept a solution done on the computer of the dev that made the bad merge, but would prefer a solution that could be done on any machine (if possible), for my own knowledge, and because that developer could be gone or his machine destroyed
Upvotes: 0
Views: 193
Reputation: 488461
As Mort said in a comment, you cannot necessarily do this on any machine.
Really, we must consider not machines but rather repositories. The question becomes: which set of repositories has or had X, and how do we find it? It doesn't matter too much what we call this ("who" or "what" or "repositories" or whatever) as long as we keep in mind that once we find this set, it's really a list of individual repositories.
Since Git is distributed, there may be many copies of the repository. Each copy is potentially a little bit different. Some number of copies—although the number could be zero—will have the original commit, and some won't. The reason the first number could be zero is just what you said:
... that developer could be gone or his [copy] destroyed
If he was the only one with a viable copy, and it's now gone, you're down to zero copies of the original and out of luck. But since he did the original operation in his Git repository, we know he must have, at some point, had the original commit with hash X. So the question expands to: who else—what other repositories—might have X? Second, for those that might have it, how do we find it?
The answer to the first question starts with this: Anyone who picked it up, had it at some point. If so, they may still have it. If not, they obviously do not have it.
The way most people pick up commits, in general, is by running git fetch
. (Note that git clone
is a wrapper that, once it's created a new repository, runs git fetch
, so even git clone
is a case of git fetch
.) When you run git fetch
, you instruct your Git to connect to another Git. Your Git asks the other Git: What branch and tag names do you have? Their Git sends yours a list, and your Git picks out the names it likes and asks their Git to send any reachable commits and other objects that go with those names, that your Git does not already have.
Once your Git has those objects, your Git takes those names and (typically) changes them in some way, and puts those names into your own repository. If they have a branch named feature
, your Git takes their name feature
and turns it into your own name origin/feature
. Your Git then sets your own reference refs/remotes/origin/feature
to the hash ID for the tip commit of their branch feature
.
(For tag names, your own Git typically makes no change, which is why tag names are "more global" than branch names.)
Typically, most people running git fetch
will bring either just one branch over—for instance, if I run git pull origin master
(which runs git fetch origin master
) I end up instructing my Git to connect to your Git, ask your Git what branches and tags and such you have, but then bring over only your master
which I then call my origin/master
—or to bring all branches over. Therefore, any Git that ran git fetch
to another Git that had the "good" commit under the name feature
at the time, and brought over feature
, will have brought over "good commit X".
This gives you the candidate pool: who might have picked up X originally? Anyone who ran git fetch
(including the git fetch
-es run by git clone
and git pull
, though the latter tends to restrict the fetch) to a repository that had X, while that repository had X.
There's one more set of repositories to consider, because there's one other way to acquire a commit: git push
can send a commit. A git push
operation is kind of like git fetch
, except that the roles are reversed. Instead of having my Git talk to your Git so that my Git can get commits from yours, I have my Git talk to your Git so that my Git can give commits to yours. I offer the commit object(s) and any other object(s), and then I send a request: Please set your branch or tag name, such as refs/heads/feature
, to this hash ID. With --force
I send a command: Set your name to this hash ID! The force flag overrides the default "is a fast-forward" check done at the receiving end (but not any other checks enforced at the receiving end: the command can still be refused).
So anyone who received the good commit as a result of a git push
could have it. This could still be true even if a subsequent force-push overwrote the name that would lead to good commit X—but by default, it's not, for the reason given in the next section.
Now that we have all our candidate repositories lined up, we need to inspect them. The place to look is in reflogs. A reflog is what Git uses to retain a history of what happened to a reference.
We have, above, been using "reference" without properly defining it. A reference is simply a name starting with refs/
. Git finds almost everything through these references. Branch names, for instance, are references starting with refs/heads/
, and tag names are references starting with refs/tags/
. Each reference stores a (single) hash ID.
Whenever you have your Git update a reference—replace the current hash ID with a new one—you have the option of having your Git save the old value, in a log entry. These log entries are your reflogs.
Each reference has a reflog, and the special name HEAD
also has a reflog. Each reflog has a potentially-unlimited length—Git just adds to it on each reference-update—so to keep the reflogs from consuming infinite space, Git assigns every reflog entry a time stamp as well. Reflog entries then eventually expire.
The expiration for any given reflog entry defaults to 30 or 90 days. This part gets a bit tricky and I don't have time to write it up here, so let's just go with the shorter time, of 30 days. While the reflog entry is alive, though, it retains the earlier hash IDs, and by retaining them, it keeps those objects "live" inside the repository database.
Meanwhile, there are two other tricky parts:
git push
requests will have reflogs disabled by default. This makes them unlikely to retain good commit X if they have had a bad replacement pushed.feature
to origin/feature
.Hence, the place to look to see if you can find the commit with good hash X is in the reflogs, probably those for origin/feature
, on each of the clients that has been running git fetch
.
The command to view the reflog, on any given Git repository, is git reflog reference
. (This actually runs git log -g
so it's affected by user configuration git log
options.) Here's a snippet of git reflog origin/master
and git reflog origin/pu
on my copy of the Git repository for Git:
3dc57ebfb refs/remotes/origin/master@{2}: fetch: fast-forward
ca0964be6 refs/remotes/origin/pu@{1}: fetch: forced-update
I show these to illustrate the difference between a more normal fetch that brings in new commits without discarding old ones: origin/master@{2}
points to commit 3dc57ebfb
, and when that update happened, it was a normal style update; but origin/pu@{1}
points to ca0964be6
and when that update happened, it was a "forced update".
If someone picked up good commit X, and then feature
was rebased and force-pushed to origin
and that someone picked up the new bad commit chain, their origin/feature
will have experienced a forced update. This means you will see a forced update in the reflog, which in turn means that the earlier reference has some chance of pointing to good commit X, or a commit whose history leads to good commit X.
Whether it actually points directly to X or to some descendant of X is a matter of chance, and you will have to inspect these commits manually to find out.
Upvotes: 1