Reputation: 17138
I have screwed up my github repository. I have pushed things to GitHub (not just local) that I wish I hadn't. The repository has 20 "steps" with Step1
merging into Step2
, which merges into Step3
and so on, all the way to Step20
It's quite badly messed up.
I want my entire repository to be reverted back to how things were on December 10, for all the branches in the whole repository.
I've seen a number of ways to do this for a given branch, and I guess I can do that twenty times, right?
However, I'm hoping there is a way to do it without checking out all twenty branches and setting each one back.
Upvotes: 1
Views: 76
Reputation: 489253
Unless you have ever removed commits or deleted some branch name(s), or deleted and re-cloned your repository, this is actually quite easy for a typical setup. This is in part due to:
... December 10 ...
which, at this time, is just ten days ago. Things get harder after 30 or 90 days, due to reflog entry expiration. (This assumes the default configuration in which you have not told Git to use a different retention time period.)
The thing to remember here is that what Git stores is not changes and not files, but rather commits. (Each commit stores files, so that Git indirectly stores files, but the level at which things are visible is the commit.) Every commit is uniquely identified by some hash ID, which is a big ugly string like 5d826e972970a784bd7a7bdf587512510097b8c7
. Normally, too, you only ever add new commits to a Git repository. This is true even with operations that seem to remove commits, such as git commit --amend
or git rebase
. The original commits—with their unchanging hash IDs that are exactly as permanent as the commits themselves—are still in your repository.
Each of your branch names simply acts as a pointer. That is, each one stores one hash ID. The one hash ID stored in master
may be 5d826e972970a784bd7a7bdf587512510097b8c7
today. Tomorrow, after you make a new commit, it will be something else, but there will still only be one hash ID in master
.
When you make a new commit, Git:
Packages up the full snapshot (from your index, aka staging area aka cache). This makes a Git tree object (not something you normally need to care about) that gets its own hash ID.
Collects your log message, your name and email address and the current date/time, and so on. To this, Git adds the tree hash from step 1, a parent
line recording the current commit's hash ID from the branch name, and any other suitable metadata that Git wants to include. From all of this data, Git makes a commit object, which acquires its own new, unique hash ID.
(Part of the reason for the time stamp is to make sure that the hash ID is unique. Otherwise, if you made a new snapshot that matched an old one, with the same parent and other metadata, you'd get the old commit's hash ID back! That would make two branches collapse into one branch. This isn't actually fatal—you can make this condition occur via trickery and scripting to make more than one commit per second, for instance, and it does actually work—but it's deeply surprising and has the potential to break some workflows.)
Records the hash ID from step 2 into the current branch name. Voila, the branch now points to the new commit. The new commit points back to the previous commit.
So, before making this new commit, if the name master
points to some commit H
(H here stands in for the actual hash ID) whose parent is G
, with G
's parent being F
, and so on, you have:
... <-F <-G <-H <--master
Afterward, if we have I
stand in for the new commit, the picture is:
... <-F <-G <-H <-I <--master
Whenever Git updates a reference, such as a branch name like master
, Git records the previous value of that branch name into a log. This log is called the reflog for the reference. Each log entry has a date/time stamp on it as well, so that you can run git reflog master
to have it spill out:
$ git reflog master
5d826e9729 master@{0}: merge refs/remotes/origin/master: Fast-forward
965798d1f2 master@{1}: merge refs/remotes/origin/master: Fast-forward
8a0ba68f6d master@{2}: merge refs/remotes/origin/master: Fast-forward
...
(This is my Git repository for Git, where not-very-interesting things happen on master
: basically, I just fast-forward it all the time.)
These entries are numbered according to their recency in the log: @{0}
is the current value, @{1}
is the previous, @{2}
is what had been previous when @{1}
was @{0}
and so on. The default for git reflog
is to print them out numbered, like this, but with --date=relative
, it prints their time stamps instead:
5d826e9729 master@{11 days ago}: merge refs/remotes/origin/master: Fast-forward
965798d1f2 master@{12 days ago}: merge refs/remotes/origin/master: Fast-forward
and so on.
You can also use --date=local
, and many other formats. To see them all, read the git log
documentation (git reflog
is actually git log -g
). Try it with --date=local
now though.
Imagine, then, that we have something simple like this:
...--F--G--H--I <--master
(I've stopped drawing the internal arrows because it's too hard / annoying, as you'll see in a moment. Just remember, they necessarily point backwards: you cannot make an old commit that refers forwards to a new commit that does not yet exist, because you don't know the hash ID of the new commit until after you make it. Then it's too late, because nothing in any commit—or in fact, in any Git internal object—can ever be changed.)
In order to roll master
back to the way it was yesterday, when it pointed to H
instead of I
, you just need to tell Git: set the name master
to point to commit H
now. The result is:
I
/
...--F--G--H <-- master
Commit I
isn't gone. In fact, this just added a new entry to the reflog, so that master@{1}
remembers the hash ID of master
as of the time it pointed to I
. (That new reflog entry has the date-and-time stamp of "now".)
Eventually—after at least 30 days by default, and 90 days for most typical cases—Git will scan through the reflog for master
and discard any entries that are too old. But until at least 30 days from now, master@{<something>}
will remember the hash ID of commit I
.
All we have to do, then, is find out what all the reflogs say about all the branches. For each branch that existed ten days ago, which commit did that branch point-to?
You could do this by enumerating all the branch names:
$ git branch
<list of names spills out, one of them probably marked `*` as current branch>
Then, for each name, you can run git reflog --date=local name
and look for the entry that corresponds to December 10th. This might be dated before the 10th, as in this case:
$ git reflog --date=local master
5d826e9729 master@{Sun Dec 9 11:03:46 2018}: merge refs/remotes/origin/master: Fast-forward
965798d1f2 master@{Sat Dec 8 07:53:19 2018}: merge refs/remotes/origin/master: Fast-forward
This means 5d826e9729
is the value master
held on the 9th, with nothing newer since then, so it must also be the value master
held on the 10th.
You can repeat this for every branch name and figure out which commit you want each of those names to identify. If the branch didn't exist before the 10th, you may want to delete the branch entirely; be aware that if you do so, Git also deletes its reflog, so there is no going back from this.
There is, however, an easier way. The reflog syntax allows you to write:
master@{10.days.ago}
or:
master@{10.dec.2018}
to have Git itself figure this out! In general, any Git command that takes a hash ID takes this as well. For instance, git rev-parse
turns a name into the corresponding hash ID:
$ git rev-parse master
5d826e972970a784bd7a7bdf587512510097b8c7
but:
$ git rev-parse master@{8.dec.2018}
965798d1f2992a4bdadb81eba195a7d465b6454a
I used 8 dec
here since my reflog doesn't have anything interesting later than that. Note that what came out is 96579...
, which is master@{1}
, which is from the 8th.
Watch out though: if you have more than one value on your selected day, Git will pick one of them. In fact, 8.dec.2018
means midnight on that day, and you can add more time specifiers if you want 0900 or 1432 or whatever. Beware of time zone issues as well—make sure you pick the right reflog entry!
Instead of manually invoking git branch
, copying the names, and then manually doing each one, you can start with this:
git for-each-ref --format='%(refname:short)' refs/heads |
while read branch; do echo git branch -f $branch $branch@{10.dec}; done
which prints out the commands that it would run, if it had echo
taken out.
If they look OK, and you're really sure that this is all correct, you can now take out the echo
to make the commands actually happen.
(This assumes an sh/bash compatible command line interpreter.)
Upvotes: 2