Nick Hodges
Nick Hodges

Reputation: 17138

How to revert an entire git repository -- including all branches -- to a previous date?

I have screwed up my github repository. I have pushed things to GitHub (not just local) that I wish I hadn't. The repository has 20 "steps" with Step1 merging into Step2, which merges into Step3 and so on, all the way to Step20 It's quite badly messed up.

I want my entire repository to be reverted back to how things were on December 10, for all the branches in the whole repository.

I've seen a number of ways to do this for a given branch, and I guess I can do that twenty times, right?

However, I'm hoping there is a way to do it without checking out all twenty branches and setting each one back.

Upvotes: 1

Views: 76

Answers (1)

torek
torek

Reputation: 489253

Unless you have ever removed commits or deleted some branch name(s), or deleted and re-cloned your repository, this is actually quite easy for a typical setup. This is in part due to:

... December 10 ...

which, at this time, is just ten days ago. Things get harder after 30 or 90 days, due to reflog entry expiration. (This assumes the default configuration in which you have not told Git to use a different retention time period.)

Background

The thing to remember here is that what Git stores is not changes and not files, but rather commits. (Each commit stores files, so that Git indirectly stores files, but the level at which things are visible is the commit.) Every commit is uniquely identified by some hash ID, which is a big ugly string like 5d826e972970a784bd7a7bdf587512510097b8c7. Normally, too, you only ever add new commits to a Git repository. This is true even with operations that seem to remove commits, such as git commit --amend or git rebase. The original commits—with their unchanging hash IDs that are exactly as permanent as the commits themselves—are still in your repository.

Each of your branch names simply acts as a pointer. That is, each one stores one hash ID. The one hash ID stored in master may be 5d826e972970a784bd7a7bdf587512510097b8c7 today. Tomorrow, after you make a new commit, it will be something else, but there will still only be one hash ID in master.

When you make a new commit, Git:

  1. Packages up the full snapshot (from your index, aka staging area aka cache). This makes a Git tree object (not something you normally need to care about) that gets its own hash ID.

  2. Collects your log message, your name and email address and the current date/time, and so on. To this, Git adds the tree hash from step 1, a parent line recording the current commit's hash ID from the branch name, and any other suitable metadata that Git wants to include. From all of this data, Git makes a commit object, which acquires its own new, unique hash ID.

    (Part of the reason for the time stamp is to make sure that the hash ID is unique. Otherwise, if you made a new snapshot that matched an old one, with the same parent and other metadata, you'd get the old commit's hash ID back! That would make two branches collapse into one branch. This isn't actually fatal—you can make this condition occur via trickery and scripting to make more than one commit per second, for instance, and it does actually work—but it's deeply surprising and has the potential to break some workflows.)

  3. Records the hash ID from step 2 into the current branch name. Voila, the branch now points to the new commit. The new commit points back to the previous commit.

So, before making this new commit, if the name master points to some commit H (H here stands in for the actual hash ID) whose parent is G, with G's parent being F, and so on, you have:

... <-F <-G <-H   <--master

Afterward, if we have I stand in for the new commit, the picture is:

... <-F <-G <-H <-I   <--master

Reflogs

Whenever Git updates a reference, such as a branch name like master, Git records the previous value of that branch name into a log. This log is called the reflog for the reference. Each log entry has a date/time stamp on it as well, so that you can run git reflog master to have it spill out:

$ git reflog master
5d826e9729 master@{0}: merge refs/remotes/origin/master: Fast-forward
965798d1f2 master@{1}: merge refs/remotes/origin/master: Fast-forward
8a0ba68f6d master@{2}: merge refs/remotes/origin/master: Fast-forward
...

(This is my Git repository for Git, where not-very-interesting things happen on master: basically, I just fast-forward it all the time.)

These entries are numbered according to their recency in the log: @{0} is the current value, @{1} is the previous, @{2} is what had been previous when @{1} was @{0} and so on. The default for git reflog is to print them out numbered, like this, but with --date=relative, it prints their time stamps instead:

5d826e9729 master@{11 days ago}: merge refs/remotes/origin/master: Fast-forward
965798d1f2 master@{12 days ago}: merge refs/remotes/origin/master: Fast-forward

and so on.

You can also use --date=local, and many other formats. To see them all, read the git log documentation (git reflog is actually git log -g). Try it with --date=local now though.

Your job

Imagine, then, that we have something simple like this:

...--F--G--H--I   <--master

(I've stopped drawing the internal arrows because it's too hard / annoying, as you'll see in a moment. Just remember, they necessarily point backwards: you cannot make an old commit that refers forwards to a new commit that does not yet exist, because you don't know the hash ID of the new commit until after you make it. Then it's too late, because nothing in any commit—or in fact, in any Git internal object—can ever be changed.)

In order to roll master back to the way it was yesterday, when it pointed to H instead of I, you just need to tell Git: set the name master to point to commit H now. The result is:

             I
            /
...--F--G--H   <-- master

Commit I isn't gone. In fact, this just added a new entry to the reflog, so that master@{1} remembers the hash ID of master as of the time it pointed to I. (That new reflog entry has the date-and-time stamp of "now".)

Eventually—after at least 30 days by default, and 90 days for most typical cases—Git will scan through the reflog for master and discard any entries that are too old. But until at least 30 days from now, master@{<something>} will remember the hash ID of commit I.

All we have to do, then, is find out what all the reflogs say about all the branches. For each branch that existed ten days ago, which commit did that branch point-to?

You could do this by enumerating all the branch names:

$ git branch
<list of names spills out, one of them probably marked `*` as current branch>

Then, for each name, you can run git reflog --date=local name and look for the entry that corresponds to December 10th. This might be dated before the 10th, as in this case:

$ git reflog --date=local master
5d826e9729 master@{Sun Dec 9 11:03:46 2018}: merge refs/remotes/origin/master: Fast-forward
965798d1f2 master@{Sat Dec 8 07:53:19 2018}: merge refs/remotes/origin/master: Fast-forward

This means 5d826e9729 is the value master held on the 9th, with nothing newer since then, so it must also be the value master held on the 10th.

You can repeat this for every branch name and figure out which commit you want each of those names to identify. If the branch didn't exist before the 10th, you may want to delete the branch entirely; be aware that if you do so, Git also deletes its reflog, so there is no going back from this.

There is, however, an easier way. The reflog syntax allows you to write:

master@{10.days.ago}

or:

master@{10.dec.2018}

to have Git itself figure this out! In general, any Git command that takes a hash ID takes this as well. For instance, git rev-parse turns a name into the corresponding hash ID:

$ git rev-parse master
5d826e972970a784bd7a7bdf587512510097b8c7

but:

$ git rev-parse master@{8.dec.2018}
965798d1f2992a4bdadb81eba195a7d465b6454a

I used 8 dec here since my reflog doesn't have anything interesting later than that. Note that what came out is 96579..., which is master@{1}, which is from the 8th.

Watch out though: if you have more than one value on your selected day, Git will pick one of them. In fact, 8.dec.2018 means midnight on that day, and you can add more time specifiers if you want 0900 or 1432 or whatever. Beware of time zone issues as well—make sure you pick the right reflog entry!

Automating this

Instead of manually invoking git branch, copying the names, and then manually doing each one, you can start with this:

git for-each-ref --format='%(refname:short)' refs/heads |
    while read branch; do echo git branch -f $branch $branch@{10.dec}; done

which prints out the commands that it would run, if it had echo taken out.

If they look OK, and you're really sure that this is all correct, you can now take out the echo to make the commands actually happen.

(This assumes an sh/bash compatible command line interpreter.)

Upvotes: 2

Related Questions