Mark VY
Mark VY

Reputation: 1671

git line endings behavior does not match documentation

I'm seeing git do things with line endings which seems to contradict everything I've seen on this site and the official docs and even its own warning messages. (Or maybe I fail at reading comprehension.) Here's a small reproduction.

# repro.sh
git --version # 2.27.0.windows.1
mkdir empty
cd empty
echo '* text=auto !eol' > .gitattributes
echo hi > t.txt
git init
git config core.autocrlf false # I guess git attributes overrides this anyway?
git add t.txt # wait what?  warning: LF will be replaced by CRLF in t.txt???  I thought git likes LF?
git commit -m 'the plot thickens'
git cat-file -p `git rev-parse HEAD:t.txt` > temp.txt # get raw blob as its really stored, I hope
od -c t.txt # original file in working dir ends in LF
od -c temp.txt # file from git also ends in LF, despite git warning??
# end of script

Does this make any sense? I thought that git sometimes likes to convert CRLF to plain LF on "git add" and does the reverse on checkout, but I've never heard of it turning plain LF into CRLF on git add, as the warning seems to threaten. And then it doesn't do it. The file that gets checked in is exactly what I have in my working directory, as verified by cat-file. So why warn at all? What's going on?

Upvotes: 2

Views: 280

Answers (1)

torek
torek

Reputation: 487893

The message itself has always seemed a bit ... wrong? weird? ill-phrased?—I'm not sure what to call it. The intent of the message is to warn you that something seems inconsistent: that the way you see the file in the future may not be compatible with the way you see the file right now.

With that in mind, let's get to specifics:

echo '* text=auto !eol' > .gitattributes

First, on text=auto: this sets the text attribute to a string value auto, which tells Git: please guess whether each file is text or binary. I personally think this is a bad idea: you don't want Git to guess. You should just tell it. Git's guesses are usually pretty good, but I don't like my software to guess that much. :-)

In any case, let's move on to !eol: this means to set the eol attribute to the unspecified state. This may not be what you want. It starts unspecified, so if you don't want to specify it, you can just leave it unspecified. The ! prefix exists so that you can correct some previous setting: for instance, if the default should be eol=lf, you might have:

* eol=lf

but since JPG files should not be munged we can then override this just for *.jpg:

*.jpg !eol

(although *.jpg binary is probably better: it means -diff -merge -text and with -text the eol attribute becomes irrelevant).

So, what we have so far is: a file is text if and only if Git guesses that it's text, and the eol attribute is unspecified.

git config core.autocrlf false # I guess git attributes overrides this anyway?

The text attribute specifically overrides this one. The gitattributes documentation says, in part:

If the text attribute is unspecified, Git uses the core.autocrlf configuration variable to determine if the file should be converted.

This doesn't say what happens if the text attribute is specified (which it is, to auto), but going back just a bit, we find that with text=auto:

If Git decides that the content is text, its line endings are converted to LF on checkin. When the file has been committed with CRLF, no conversion is done.

This talks only about checkin. The documentation doesn't say this, but that's really during git add, which is when Git will, maybe, turn CRLF into LF-only.

git add t.txt # wait what?  warning: LF will be replaced by CRLF in t.txt???

Git emits these warnings during git add (unless they're suppressed through other configurations) when it notices anything suspicious. The warnings are, or at least include, what you've seen, which I sometimes call ill-phrased (for lack of a better term). I don't have a better way to phrase them that's not so verbose that it becomes problematic, though.

Warning: verbose = problematic description here 😀

There are only two built-in LF/CRLF conversions:

  • An "on the way in" conversion that turns CRLF into LF-only: this happens during only git add, and then only if it is, or seems to be, called-for.

  • An "on the way out" conversion that turns LF-only into CRLF: this happens during git checkout, git reset --hard, git restore (if run with an explicit or implied --worktree), and other similar operations. But, like the on-the-way-in CRLF-to-LF conversion, it only happens if it is or seems-to-be called-for.

What is happening here is that Git is suspicious that you'll have an LF-to-CRLF conversion occur on the way out, some time in the future. I think your setup is not configured this way right now, because you have !eol and are on Linux (you are on Linux? maybe not: you said windows in a version string). So maybe your setup is configured this way right now because you have !eol and are on Windows. I don't use Windows, so I'm not sure what the defaults are on Windows.

Meanwhile, though, t.txt, as seen in both your index and your working tree, has pure LF-only line endings. If Git were to perform an on-the-way-out LF-to-conversion (from index copy to working tree copy), your t.txt file in your working tree would suddenly have CRLF line endings.

That's what this warning message means. If, in the future, Git does text conversion on the file, the result of extracting what's now in Git's index won't match the actual file in your working tree right now. The one conversion that Git can do here is to turn LF-only into CRLF, and t.txt is currently LF-only.

On to the last few steps

git commit -m 'the plot thickens'

The plot hasn't really thickened, here. All conversions happened before this point. The commit command merely takes the t.txt file stored in Git's index (that's the only file in Git's index at this point since the repository is all-new) and makes a commit out of that.

git cat-file -p `git rev-parse HEAD:t.txt` > temp.txt
# get raw blob as its really stored, I hope

This does, yes. You could equally grab :t.txt from the index, or use git ls-files --stage to get the blob hash ID.

Note that the git commit step has not modified the working tree copy. It's still untouched. To force Git to extract the index copy back to the working tree, first remove the working tree copy, then use any Git command that will create it afresh. This will run the extraction step, which will—or won't—turn LF-only to CRLF as requested by your various configurations:

rm t.txt
git checkout -- t.txt

You can now use od or similar to see what happened. Did \n become \r\n, or not? That tells you how Git interprets your current setup (core.autocrlf, core.eol, and the various attributes in .git/info/attributes and .gitattributes) for this file.

Note: git ls-files --eol has, since Git 2.8, been able to tell you more about what's going on here. It will, separately:

  • examine what's in the index;
  • examine what's in the working tree; and
  • see which attributes apply

to each file that is presently in the index.

Upvotes: 3

Related Questions