Reputation: 1284
Git shows I have a modified file:
> git status StdAfx.cpp
On branch MyBranch
Your branch is up to date with 'origin/MyBranch'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: StdAfx.cpp
no changes added to commit (use "git add" and/or "git commit -a")
When I diff it, it says they are different even when they are not:
> git diff StdAfx.cpp
diff --git a/somedir/StdAfx.cpp b/somedir/StdAfx.cpp
index fb0263ae54..92b182f686 100644
--- a/somedir/StdAfx.cpp
+++ b/somedir/StdAfx.cpp
@@ -1,5 +1,5 @@
-// stdafx.cpp : source file that includes just the standard includes
-// blah.pch will be the pre-compiled header
-// stdafx.obj will contain the pre-compiled type information
-
-#include "stdafx.h"
+// stdafx.cpp : source file that includes just the standard includes
+// blah.pch will be the pre-compiled header
+// stdafx.obj will contain the pre-compiled type information
+
+#include "stdafx.h"
No whitespace difference at any rate:
> git diff -w StdAfx.cpp
Maybe line endings?
> mv StdAfx.cpp StdAfx.cpp.save
> git checkout StdAfx.cpp
> diff StdAfx.cpp StdAfx.cpp.save
> git diff -w StdAfx.cpp
> file StdAfx.cpp*
StdAfx.cpp: C source, ASCII text, with CRLF line terminators
StdAfx.cpp.save: C source, ASCII text, with CRLF line terminators
Nope.
Maybe permissions?
> ls -l StdAfx.cpp*
-rwxrwxrwx 1 me me 208 Apr 24 15:01 StdAfx.cpp
-rwxrwxrwx 1 me me 208 Apr 24 15:00 StdAfx.cpp.save
Nope.
Still thinks I've made changes though, even though I checked it out above:
> git status StdAfx.cpp
On branch MyBranch
Your branch is up to date with 'origin/MyBranch'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: StdAfx.cpp
no changes added to commit (use "git add" and/or "git commit -a")
Still thinks there are diffs...
> git diff StdAfx.cpp
diff --git a/somedir/StdAfx.cpp b/somedir/StdAfx.cpp
index fb0263ae54..92b182f686 100644
--- a/somedir/StdAfx.cpp
+++ b/somedir/StdAfx.cpp
@@ -1,5 +1,5 @@
-// stdafx.cpp : source file that includes just the standard includes
-// blah.pch will be the pre-compiled header
-// stdafx.obj will contain the pre-compiled type information
-
-#include "stdafx.h"
+// stdafx.cpp : source file that includes just the standard includes
+// blah.pch will be the pre-compiled header
+// stdafx.obj will contain the pre-compiled type information
+
+#include "stdafx.h"
Why does git think I've made changes when I haven't? And how can I fix this? Note the problem above is repeated for many, many files.
Note: I am using WSL on Windows 10.
Upvotes: 2
Views: 107
Reputation: 1284
The problem was solved by repeated applications of
git reset --hard # be careful! will wipe out real changes!
git add -u
git reset # to unstage them
git reset --hard # again
Not a very satisfactory answer because I couldn't identify a root cause. Perhaps this is due to line endings, but I work with a similar tool set and settings across multiple projects and only this one is a problem child. Or else some weird file permissions issue with WSL. But those are little better than guesses.
But I post the answer here in case someone else gets stuck like this.
Thanks to @torek for his great answer.
Upvotes: 0
Reputation: 488183
It really is almost certainly line-endings (CRLF vs LF). I recommend not setting autocrlf=true
; if you must set up Git to mess with line endings, use .gitattributes
to set things. I don't have one single Right Way recommendation, because the problem you have here is a bit thorny and all solutions have perils. You need to know what you're getting into, and choose your own course.
If existing committed files—which you can't see directly; they're stored in a Git-only format—have CRLF line endings, turning off all manipulation will make the problem vanish. Future committed files will continue to have CRLF line endings. You may or may not want to use this solution.
If existing committed files have LF-only line endings, you probably want to turn on certain manipulations. You may want this anyway, along with a one-time commit to fix all the repository files, but you need to be aware of what this does and how it presents everything in the future.
When you're on a system that uses CRLF line endings, and you tell Git to mess with your line endings, Git will mess with your line endings, because you told it to.1 Setting core.autocrlf
tells Git: Guess what I have in my files, and based on your guesses, mess with line-endings. This is good because it's easy, but it's bad because Git does not necessarily guess correctly.
Using .gitattributes
, you can mark particular files as text and others as binary, and tell Git: for the text files, mess with my line endings; for the binary ones, do not touch these at all. This is much more work, but also much safer: Git stops guessing and does what you tell it. You can also tell it, explicitly, exactly how to mess with line endings.
The Git project folks themselves use .gitattributes
, e.g.:
$ head -8 .gitattributes
* whitespace=!indent,trail,space
*.[ch] whitespace=indent,trail,space diff=cpp
*.sh whitespace=indent,trail,space eol=lf
*.perl eol=lf diff=perl
*.pl eof=lf diff=perl
*.pm eol=lf diff=perl
*.py eol=lf diff=python
*.bat eol=crlf
which is as close as I will get to a particular recommendation. Note that *.bat
(batch files) are the only files with eol=crlf
here.
1You can just tell Git never mess with anything and it won't. That's the way I prefer to use it, but I also avoid systems that use CRLF line endings, so that I never face the problem in the first place.
You tried to check for this sort of CRLF fiddling:
> mv StdAfx.cpp StdAfx.cpp.save > git checkout StdAfx.cpp > diff StdAfx.cpp StdAfx.cpp.save > git diff -w StdAfx.cpp
[shows nothing]
but this doesn't really help, because one of the programs that puts the CRLF line endings in, thereby (potentially) changing the file, is git checkout
itself! The two files don't differ here, because if git checkout
changed them, it changed them the same way each time.
There are two things Git can do to files, in two different phases. To understand this correctly, you need to know something about how Git stores files.
You need to know—maybe you do already—that each Git commit stores a full copy of every file, as a snapshot, but stores it in a special, read-only, Git-only, compressed and frozen format. You can't actually see these files: you have to have Git extract them, and then you can inspect the extracted files. But Git has to transform them during extraction.
Of course, if all your files were frozen forever and unreadable as well, Git would be useless. So Git does extract your files, to regular everyday files that you can list out, open in an editor, check for CRLF line endings, and so on. These useful files form what Git calls your working tree or work-tree. But these are transformed files. They're not what's actually in the commit. In a pretty strong sense, these files are not in the repository at all! They just exist so that you can get work done.
You also need to know about Git's index. The index, which Git also calls the staging area, is an intermediate step in both extracting files, and putting files into new commits. We'll skip many of the details, but note that the index acts as Git's proposed next commit and is initially filled-in by copying the frozen-format files out of the commit you check out. What's in the index includes:
path/to/file.ext
;+x
or -x
, stored as mode 100755
or mode 100644
;
andUnlike the copy in the commit, though, the index copy isn't actually frozen, in that you can replace it, or even delete it. This is what git add
and git rm
do: git add
in particular takes the work-tree copy and transforms it, compressing it into the frozen format, ready to go into the next commit.
What this means is that the index initially contains all the files from the current commit. You manipulate your work-tree files, copy the results back into the index, and then run git commit
, and Git makes the new frozen commit from the index data, not from the work-tree files.
Note that there are two copying steps:
git checkout
does this: it fills the index and it fills your work-tree from the index.git add
does this, for instance.Each of these two steps can make changes. With core.autocrlf
set, each does.
Git can treat any given file as either text, meaning consists of lines with line endings, or binary, meaning consists of bytes that must be preserved: hands off! no touching! With core.autocrlf
set, Git will guess whether a file is text or binary by examining some or all of its bytes. (Note: different vintages of Git have different internal tests here).
Since binary files are merely uncompressed without changing any of their bytes, those files are safe. Git doesn't modify them during either copy. So only text
files are interesting here.
There are literally only two changes that Git can make on its own. You can define smudge and clean filters if you want more changes—these operate right in the same space, during the copying—but Git only does two things:
That's all Git can do on its own. If you set core.autocrlf
, Git does both.
If you use a .gitattributes
entry, you gain finer control. The first part is a file-name pattern; *
matches every file, or *.txt
matches files whose name ends with .txt
. The rest are the controls. Note that the text
in all cases tells Git: do mess with the file. The eol=
tells Git how to mess with the file:
*.txt text eol=crlf
Replace LF-only with CRLF during index -> work-tree, and CRLF with LF-only during work-tree -> index.
*.txt text eol=lf
Preserve the file during index -> work-tree, but turn CRLF to LF during work-tree -> index.
There is an older set of attributes that you can write as, e.g.:
*.txt crlf
*.jpg -crlf
*.sh crlf=input
These tell Git the same thing as *.txt text eol=crlf
, *.jpg -text
, and *.sh text eol=lf
. You should only use these if your Git version is too old to understand the text eol=
directives. Check your particular installation's git help attributes
output.
All existing commits are frozen for all time. If some file(s) in those commits are committed with CRLF line endings, they are that way forever, in those commits.
If you choose eol=crlf
as a setting, future commits in which you have changed the file at all, will change all the lines to have LF-only line endings inside the committed copy—the copy you can't see. That's because when you git add
your updated file, Git will turn all the CRLF endings to LF-only endings as it replaces the stored file in the index.
Extracting either the old (CRLF-ending) or the new (LF-only) commits will produce work-tree files that have CRLF line endings. Although Git will say that the files changed in the commit that stripped out all the carriage-returns—and they did change—you won't be able to smell the change in your work-tree because it won't happen there.
If the committed copy of the file has CRLF line endings, but you don't touch the work-tree copy of the file, Git won't copy the work-tree file back to the index copy—at least, not normally. So the index copy, which is literally exactly the same as the committed copy—they share an internal Git blob object—will still have the CRLF line endings, even though your .gitattributes
says eol=crlf
, implying that adding the file (copying it back into the index) will strip out the carriage-returns.
Newer version of Git have git add --renormalize
to let you re-apply updated .gitattributes
.
In many versions of Git (though I think this may have been fixed in recent ones), changing the .gitattributes
file doesn't signal Git to check whether the eol=
settings might have changed. There is a simple workaround for this: remove or rename the file, and re-extract it, as you did to try to test for CRLF line endings earlier.
That is, after changing the .gitattributes
copy in your work-tree, you may need to trick Git into redoing extractions (by removing or renaming the file, then using git checkout
or git restore
to copy from index to work-tree again). You can similarly trick Git into redoing git add
, by using touch
or similar on the file to update its time-stamp, then git add
the file. (Without --renormalize
and/or bug fixes, Git doesn't notice that a new work-tree -> index copy would do something different this time.)
Upvotes: 1