Reputation: 131546
I've recently started using git after having gotten used to mercurial.
In mercurial, if I hg add
some files, then hg diff
, I get a patch which I can then apply, theoretically, with a simple patch -p1
and get the exact same local copy.
Now, with git, things are different: You git diff
before you git add
. But how can I make git diff
also cover all of the untracked files, like hg diff
after hg add
ing?
Upvotes: 4
Views: 1462
Reputation: 488183
git diff --cached
Mercurial and Git have different philosophies here. Git explicitly exposes what Git calls the index. Mercurial doesn't have the index (it has something similar internally but does not expose it, so you need not even be aware that it exists). Many people who prefer Git believe the exposed index is great, and many people who curse Git believe it is terrible. :-) Nonetheless, it's what is getting in your way here, and if you are using Git, you are using the index, so it's time to learn what it is and how to deal with it.
So, let's define "the index". Git's index—which is also called the staging area and sometimes the cache—is a complicated little beastie, with a lot of mostly hidden aspects that Git doesn't normally expose. It does, however, have a simple definition that you will need to know: it's where you build the next commit to make.
It's worth a small aside here on another difference between Git and Mercurial. Mercurial stores changes—changesets, to be technical—while Git stores snapshots. Most of the time, this makes no real difference. A snapshot is easily converted to a changeset: just diff the snapshot against its parent. Given the parent-as-snapshot, a changeset is easily converted to a new snapshot: just apply the changeset. Applying a very long chain of changesets is slow, though, so Mercurial periodically stores a snapshot. It does all this behind the scenes and you never have to be aware of it. Git, as usual, exposes everything (it's kind of like a flasher or streaker that way, running around naked, exposing dangly bits no one really wants to see).
When you run git commit
, Git converts whatever is in the index into a commit snapshot. So git add
puts a file into the index. If the file is already there, git add
replaces the existing copy with a new version taken from the work-tree. If the file is not there yet, git add
puts the work-tree version into the index as a new file. Either way, the index version is now updated—staged—and ready to go into the next snapshot.
To take a file out of the index, you can run git rm
. This removes the file from both the index and the work-tree. Or, you can run git rm --cached
, which takes it only out of the index, leaving it in the work-tree (but beware, as this can be a bit of a future trap).
Now, because the index / staging-area / cache is exposed like this, you can git diff
it. To do that, use git diff --cached
or git diff --staged
(these have exactly the same meaning; I generally stick with --cached
because git rm
has --cached
but not --staged
).
The problem is that this only diffs files that have been updated in the index. More precisely, it runs the equivalent of git diff HEAD <index>
, i.e., it compares the current commit to the index's contents. This means that any files you have modified in the work-tree, but not staged yet, are not diff-ed. The solution is trivial though: just git add
those files.
.gitignore
and untracked vs ignoredAdding a bunch of files one at a time is painful, so you may want to use git add .
or git add -A
(these are subtly different; see other StackOverflow questions and answers, and note that there was a big change around Git version 2.0 affecting the -A
option here). However, your work-tree often has files you don't want to add, and this is when we get into untracked vs untracked-and-ignored files.
Now that we know what the index is, there's a remarkably (for Git) short and sweet definition of an untracked file. An untracked file is one that is not in the index. That's it—that's all there is to it. If it's in the index, it's tracked. If not, it's not.
But of course there is a complication (there is in Mercurial as well): if you have a bunch of untracked files, you get a lot of whining from the version control system about them. To shut it up, you can add file names or glob patterns to .gitignore
. Note that unlike Mercurial, you cannot add regular expressions to .gitignore
, only glob patterns. This is both good (glob patterns are far easier to get right) and bad (glob patterns are not as powerful as full regular expressions), but in any case, it is what it is.1
Files listed in .gitignore
will not be automatically added with git add -A
or git add .
. However, listing a file in .gitignore
does not make it untracked. The only thing that makes a file untracked is that it's not in the index. If you accidentally get a file into the index that should not be tracked, you must git rm
it from the index.
People moving from Mercurial to Git usually really hate the index at first. One thing that makes it much more palatable to many is git add -p
. Some people have no use at all for this, but for those who do, it is actually quite nice.
The separation that Git gives you between "what is added to the index, and will be in the next commit" and "what is in the work-tree" means that you can check out a branch, modify some items for debug purposes, modify other items (in the same or separate files) to fix a problem or add a feature, and then selectively add only the bug fix or new feature, and not the debug changes.
When you git commit
the result, you get a commit that has only the bug fix or new feature, and not the extra debugging.
As usual, this has both advantages and disadvantages. It makes it hard to be sure that what you have just committed really works, for instance. Maybe only the extra debug makes it work. Maybe you forgot to git add
some part of it. However, because Git kind of encourages
"amending" and rewriting commits,2 and makes committing and branches really cheap, you can work differently in Git than in Mercurial. Mercurial branches are heavier and its commits and rebases and hg histedit
are noticeably slower, which discourages this kind of fast and loose commit-recommit-rebase-fixup-squash work. Git strongly encourages this. You should use Git differently, making a lot of temporary commits on a lot of temporary branches. You don't have to, but it's a good idea to try it.
1Mercurial supports both glob patterns and regular expressions in .hgignore
. Unfortunately, regular expressions—the ones so hard to get right—are far faster in practice than glob patterns. I have had co-workers change globs to regexps for speed, but then get them wrong. If you are converting glob patterns to regexps, remember to anchor them, and watch out for .
!
2In both Mercurial and Git, commits are pretty much permanent. However, both offer history editing and commit --amend
. They get there in very different ways: Git makes the new commits by copying old ones, and moves branch names to point to the new commits. This creates "abandoned" objects within the repository. Git uses what it calls reflogs to keep them around for a while, so that you can recover them if you want them, and then eventually expires the reflog entries and "garbage collects" left-over junk to get rid of it completely.
Mercurial literally can't do that, so instead it "strips" changesets, dropping them into strip-backup exported changeset files. You can then re-import them if you want them back. This is much slower than Git's loosey-goosey "commit, recommit, move branch pointer, abandon old objects" method of "rewriting history". Since Git's method costs less, in terms of both time and space, to do—temporary commits that you'll rewrite are often remarkably close to free, although this does depend on "loose object" file sizes—it's much more rewarding to do this in Git.
Upvotes: 8