I am new to git and have a question of undoing changes with git reset --hard hexid workflow. For example, I am working on two files, fileA and fileB . My work flow maybe: change fileA -> commit fileA change fileB -> commit fileB now suppose I'm satisfied with my change in fileB , but I find I actually don't need to change A and want it to be the previous version. Git seems to take the snapshots of the entire working directory, so git reset --hard hexid_of_step1 will lead me to the original states for both files. Is there any way to just discard changes in fileA while keeping the changes fileB ?

Reputation: 37

Git undo changes for certain file

I am new to git and have a question of undoing changes with git reset --hard hexid workflow. For example, I am working on two files, fileA and fileB. My work flow maybe:

change fileA-> commit fileA
change fileB-> commit fileB now suppose I'm satisfied with my change in fileB, but I find I actually don't need to change A and want it to be the previous version. Git seems to take the snapshots of the entire working directory, so git reset --hard hexid_of_step1 will lead me to the original states for both files. Is there any way to just discard changes in fileA while keeping the changes fileB?

Upvotes: 3

Answers (2)

TTT

Reputation: 29129

Let's refer to your 3 states like this (displaying time-wise most recent first):

commit C: your changes to fileB
commit B: your changes to fileA
commit A: where you started before you made any changes

Right now your branch looks like A-B-C, but it sounds like you want A-C' instead, where C' is only the changes to fileB but with a new hash (because you must have a new hash when you change anything about the commit, including the parent). One simple way to get rid of commit B from your local branch's history is with an interactive rebase:

git rebase -i A # where A is the commit ID (hashcode) of A

or, similarly:

git rebase -i HEAD~2 # the second parent of your current head is A in this case

When presented with the interactive rebase commits, note that they are presented in chronological order (reverse order from git log). This is the order in which the rebase will happen. You want to drop B (or delete that line completely) and leave C as pick. Save and close and B will be gone, and C will be re-written as C'.

Side note 1: if you've already pushed your branch out to the remote and if the branch you are using is shared or has already been merged into another shared branch, then you may not want to rewrite the history of your branch. In that case it would be better to create a new commit D that undoes the changes to B. You do this with git revert B, which will yield A-B-C-D where D is the reverse of B.

Side note 2: if you don't want to rewrite your history, and you cannot simply restore the file by reverting a commit, then you can restore the file to a previous version in a new commit, by using torek's answer. That answer also has the additional benefit that you can restore a specific version of a file regardless of what's happened to that file since then, if you wish to do so.

Upvotes: 1

torek

Reputation: 489748

TL;DR

Use:

git restore --source=<hash-id> --worktree --staged -- fileA

(assuming Git 2.23 or later).

Long

You're on the right track in your point 2 when you say "Git seems to take the snapshots of the entire working directory." Git does take snapshots. But it's not quite the working tree here.

First, though, let's take note of this: Git is not about files, but rather about commits. Git is not about branches either: it's about commits. A commit contains files, so we do get to save files. A branch name like main or master helps us find commits. But it's the commits that matter. So we need to know exactly what a commit is and does, and how we find one.

As you have also seen, each commit has a unique hash ID, expressed as a huge, ugly, impossible-to-remember hexadecimal number. That number is the lowest level method by which Git finds a commit. If you have some particular commit, it has that number, no matter how you got it, and giving that number to Git will get you access to that commit. (If you don't have that commit, Git will say that it doesn't recognize the number; you'll have to hook your Git up to some other Git that does have that commit, and get it from them.)

We can't really use these numbers, most of the time, but we'll remember that they are the bottom level method by which Git finds commits. Now let's look at what's in a commit. Each commit holds two things:

A commit holds a full snapshot of every file that Git knew about, as of the form it had when you (or whoever) made the commit. We'll revisit this a bit more in a moment.
And, a commit holds some metadata, or information about that commit. The metadata include things like the name and email address of the person who made the commit. There are two of these (author and committer), and two date-and-time stamps as well. There is room for a log message, where you explain why you made the commit. And—crucially for Git's own operation—each commit stores a list of previous commit hash IDs. Usually this list has just one entry in it: the (single) parent of this commit.

Whenever we have a hash ID for some existing commit, we say that we're pointing to that particular commit. So each commit generally points backwards to one previous commit. This makes a backwards-looking chain:

... <-F <-G <-H

where H stands in for the hash ID of the last commit in the chain. Commit H holds a snapshot and some metadata—who made it, when, why, etc.—and this metadata includes the hash ID of earlier commit G. Git can then use the hash ID of G to retrieve the same stuff from that earlier commit.

Comparing the two snapshots tells us—or Git—what changed. That's generally what we'll see if we look at the commit, with git log -p or git show for instance. The actual commits each contain their own full snapshot, but we only see the files that changed. And, of course, having shown us commit H by comparing it against commit G, git log can now step back one hop to commit G itself. Git can now retrieve the snapshot from F and thus show us changes for G. From here, Git can step back one to commit F, and so on.

All of this requires one thing: Git needs to know the hash ID of the last commit H in the chain. That's where a branch name comes in: a name like main or master simply holds the hash ID of the last, or tip, commit in the chain. This is true even if there are newer commits, and that's where we see "branches":

...--G--H   <-- main
         \
          I--J   <-- feature

Here, the name feature lets Git locate commit J. Commit J points backwards to commit I, which points backwards to H, and so on.

With all that in mind, here's the direct answer to your question

Is there any way to just discard changes in fileA while keeping the changes fileB?

What we need to do is extract just one file from some earlier snapshot.

In Git 2.23 or later, the best command (by some measures anyway) for doing this is git restore. In earlier versions of Git, the command for doing this is git checkout. That still works in Git 2.23-and-later, too; it's just that git checkout is for many things, and git restore is for fewer things, so that git restore is a more focused tool.

We use whichever of these two commands we have (or just git checkout if we're still stuck with Git 1.7 in our heads, as happens to me a lot :-) ) to extract the one file we care about:

git checkout <hash-id> -- fileA

or:

git restore --source=<hash-id> --worktree --staged -- fileA

which both do exactly the same thing.

Git's index and your working tree

There's something else important to know about Git commits, or in fact, any Git object (commits are made up of three of Git's internal objects): they are completely, totally, 100% read-only. Nothing—not even Git itself—can change one once it's made. Since it's physically impossible to change a commit, you don't actually work on or with a commit directly. The files stored inside a commit are kept in a special, Git-only, compressed and de-duplicated form. This helps Git out a lot for multiple reasons: for instance, it makes it really easy to see which files are the same between any pair of commits, because if they are the same, they're de-duplicated. It also means that no matter how many commits hold some particular version of a 100 megabyte file, there's only one stored version in reality.

In any case, to use a commit, Git has to extract that commit into your working tree. That's one of the main functions of git checkout (or, in Git 2.23 or later, git switch): it will take all the files out of some commit, expanding them out into useful form, to fill in your working tree.

At the same time, though, this kind of git checkout (or git switch) does two more things: it attaches the special name HEAD to some branch name, and it fills in Git's index from that commit. The attaching-of-HEAD is important when making new commits. So is Git's index.

The index—which is also called the staging area, or sometimes the cache (rarely now, mostly in flags like git rm --cached)—is a key data structure in Git. What the index holds gets complicated in some corner and extended cases, but for the most part, it can be described pretty simply as holding your proposed next commit, or at least its snapshot. (All of the next commit's metadata are generated on the fly when you make that next commit.)

The snapshots, in other words, are made from what's in Git's index, not from what is in your working tree. This is the reason Git makes you run git add over and over again. Each git add is really a directive to Git: Take whatever I have in this file / these files in my working tree, and update your index to match.

This extra "copy" of each file, in Git's index, is actually already in the compressed and de-duplicated form, so that initially, at least, it takes no real extra space. As you modify working tree files and git add them, Git has to compress the files and turn them into ready-to-store internal blob objects, which do take a bit of space, but this makes the git commit later go very fast: everything is already in commit-able form. The drawback is that you must be aware of Git's index.

When a file exists in Git's index—as it does right after a git checkout or git switch, for instance, because Git extracts the commit to both Git's index and your working tree—that file is called tracked. A file that exists only in your working tree, not in Git's index, is called untracked.

Running git add generally takes an untracked file and copies it into Git's index (as a new entry), after which it is now tracked. Running git rm takes a file out of both Git's index and your working tree; running git rm --cached takes a file out of Git's index, without touching your working tree, and now the file is untracked. So the tracked-ness of any file is something under your control—but remember that git checkout of a commit that has the file, causes the file to go into both the index/staging-area and your working tree.

When you use git checkout hash -- file, Git copies this file to both places. When you use the newfangled git restore, you get to choose whether Git copies it to the index/staging-area, your working tree, or both. So this is another way that git restore is better, or at least different.

Note that git checkout was basically split into git switch (change branches / full checkout) and git restore (individual files), so that's really the primary difference between using the old style commands and the new ones. However, in Git before Git 2.23, git checkout had a nasty "feature" / bug of sorts:

git checkout xyzzy

might mean either:

git checkout xyzzy --    # check out the branch

or:

git checkout -- xyzzy    # restore the file

and it was not always clear which one would run, or which one you wanted, in the ambiguous cases. Since 2.23, git checkout now notices this bad case and demands disambiguation; git switch and git restore, being separate commands, are already clear on which one you mean.

There's one more thing to touch on here, at least lightly, and that's the .gitignore file. This file is misnamed: it does not mean ignore these files. Instead, it means: If these files are untracked, do not complain about them being untracked, and do not automatically add them to Git's index so that they become tracked. Once some file is tracked, however, listing that file in a .gitignore has no effect.

Naming commits

As you've no doubt seen, naming commits by hash IDs is annoying. Running git log and using cut-and-paste works fine (and I do it a lot myself), but there are lots of useful ways to name commits. These are documented in the gitrevisions manual page. It is quite complex, and worth re-reading frequently over time as you get used to some of Git's conventions and style.

The main one to think about here is the relative one. Suppose you know that you want the version of some file from the immediate previous commit. You could run git log to get its hash ID, then use git checkout or git restore to grab it. But since HEAD always names the current commit, if we just tell Git: step back one commit from HEAD we'll get the right (previous) commit. We can do that with either the hat ^ or tilde ~ suffix:

git checkout HEAD^ -- fileA

or the same with tilde. (Use whichever you prefer; if your particular command line interpreter is fussy about hat or tilde characters, consider using the other one.) Or we can move back two commits: here we need the tilde suffix as the hat suffix has a different meaning with numbers:

git checkout HEAD~2 -- fileA

With our example of:

...--G--H   <-- main
         \
          I--J   <-- feature (HEAD)

which would mean that the current branch is feature and the current commit is J, HEAD~2 or feature~2 means commit H. We can also just use the name main directly:

git checkout main -- fileA

If we want to get to commit G, we can name that as feature~3 (count back three steps: J-1 = I, -1 = H, -1 = G) or main~1 or main^ or main~.

These two—hat and tilde—can take you a long way. After that, there are methods like searching for commits, but these get pretty tricky sometimes (some searches start from all branches!).