Reputation: 185
I am trying to set up repository-level rules to convert line terminators to LF for any future commits. My .gitattributes file is as follows:
# Files of the following types should have their line terminators converted to LF
*.c text eol=lf
*.cpp text eol=lf
*.h text eol=lf
*.hpp text eol=lf
*.py text eol=lf
*.json text eol=lf
makefile text eol=lf
Makefile text eol=lf
I added it and committed it to my local branch, and now I want to test it. If I touch
a file that has CRLF terminators then it shows up as having unstaged changes in git status
. However, if I touch
a file that I know for a fact only has LF terminators, it also shows up as having unstaged changes.
foo/Makefile
- a file with CRLF terminators
foo/include/someheader.h
- a file with LF terminators
$ touch foo/Makefile
$ touch foo/include/someheader.h
$ git status
...
# Changes not staged for commit:
# modified: foo/Makefile
# modified: foo/include/someheader.h
...
Furthermore, if I try to discard changes for either file with git checkout -- <file>
they still appear as having unstaged modifications. git diff
shows a deletion and re-addition of every line, even when they are identical including the line-terminators.
I am using git version 1.8.3.1 on Linux.
Upvotes: 0
Views: 78
Reputation: 487893
Git version 1.8 is pretty old, and the filtering code has evolved a lot, but I think these particular rules still apply, and certainly the general concepts apply everywhere (including on Linux and Windows). The things to know, to make sense of all of this, are:
Files stored inside Git, in commits, are in a special, read-only, Git-only, frozen format. They cannot be changed—not at all, not one bit. There's no question of changing line-terminators as nothing in this format can be changed. (The reason is that the file is actually identified by its hash ID, with the hash being a cryptographic checksum over the contents of the file, preceded by a header that gives the object type and size. Changing the contents changes the hash ID, so that you don't have this file any more, you have some other file.)
Files stored in your work-tree, where they have their ordinary format and you can see them and work on / with them, are in any old format your computer can deal with, or even one it can't: it's all up to you after all. Git has no control over this data once it's extracted a file from the deep-freeze.
Git's index or staging area—two names for the same thing—sits between the frozen commits and the work-tree. It starts out holding a copy of—or more precisely, a reference to—the frozen file from the commit. The stuff in the staging area is therefore in this frozen format. But unlike what's in a commit, the copies in the aren't frozen. You can overwrite them at any time, with a new file in the frozen format, ready to be committed.
In any case, at all times, the index has a "staged for commit" copy of each file. This copy starts out being the copy that was in the commit. Using git reset
re-sets the index copy to match the commit's copy.1 Using git add
replaces the index copy with a ready-to-freeze copy made from what's in the work-tree.
Or, to put it another way, these are the ways that files get copied:
git checkout
and git reset
copy from a commit, into the index / staging area.
git checkout
goes on to also copy from the index / staging area to the work-tree; so does git reset --hard
.
git checkout-index
(and modes of git checkout
that basically are git checkout-index
) copy from the index / staging area to the work-tree.
git add
copies from the work-tree to the index / staging-area.
That first kind of copy never changes anything. (It's really just shoving the blob hash ID from a commit into the slot in the index, so it's fast and easy and has no place to change anything.) It's the second and third kind of copies—from index, to work-tree; and from work-tree, to index—that involve actually copying bytes from one file to another. They also have to decompress / unfreeze files (in the index -> work-tree direction) and compress / Git-ify files (in the work-tree -> index direction).
During these decompressing and compressing copies, Git can change line endings. The changes that Git has built in are:
The control knobs for these actions are:
eol=lf
eol=crlf
These conversions are completely disabled if the file is marked -text
. They are obeyed if the file is marked text
. They are optionally disabled, but obeyed by default, if the file is marked text=auto
: in this case, Git inspects the first few thousand bytes of the file to guess whether the file is text. If you care about your files, don't let Git guess how to treat them. :-)
There are older ways to control these, spelled crlf
, -crlf
, and crlf=input
, which you probably should just avoid. See the gitattributes
documentation for details.
There's one last complication here, and it is this: Git tries not to do unnecessary copy operations. That is, git checkout
, git reset
, git add
, and so forth nominally copy from one place to another—but Git will attempt to cheat and not copy files in various cases. Sometimes, changing the EOL attributes in .gitattributes
fools this optimizer: Git should copy (because the copying process will, this time, change line endings differently) but Git fails to copy (because it thinks, in its gitty little programming, that it can optimize away the copying). In these cases it often helps to just run touch
on the work-tree file, so that its time-stamps change.
Most people find out what happened to their files by looking at the work-tree copies. Since the copying process can alter the work-tree copy (on the way out of the repository), this can be misleading.
1Technically, this is specific to git reset --mixed
, git reset --hard
, and file-oriented git reset
commands (which effectively imply --mixed
). Since Git 2.23, you can also use git restore
to update an index copy from a commit copy, and in all versions of Git, you can use various clever sub-modes of git checkout
to update various index copies.
2It would be logical for eol=lf
to imply that Git should change CRLF to LF in this phase. The documentation implies that it does not, though.
Upvotes: 1