Reputation: 93
I'm using GitHub Desktop on Windows 10. I want to setup a git repository which will contain some text files (.md or .adoc) and Freemind mindmaps (.mm). Freemind uses an XML-based file format with Unix-style line feeds (LF).
Over the last 4 hours, I think I have read every StackOverflow discussion and every git documentation about EOL normalisation one can find online - and it still drives me crazy! Many discussions seem to be outdated and opinions seem to contradict each other. Here's how I got so far:
core.autocrlf
and core.safecrlf
at their default values.gitattributes
file with * text=auto
- I'm fine with auto conversion for my text files.*.mm text eol=LF
What's puzzling me is this warning:
> git add Mindmap.mm
warning: LF will be replaced by CRLF in Mindmap.mm.
The file will have its original line endings in your working directory.
As far as I understand it * text=auto
is equivalent to core.autocrlf=true
and ensures that all EOL's are converted to LF at committing - So LF->LF in this case. And *.mm text eol=LF
ensures that LF's are preserved at checkout - LF->LF in the other direction, too. No CRLF's involved! So why is Git warning me that some conversion will result in CRLF's?
Question 1: I'd like to make sure that I won't get cross-platform problems with UNIX users when I'll go public with my project on GitHub. What is the best practice for my case? If I did everything all right, can I ignore the warning?
Question 2: Under certain circumstances .mm files can also contain CRLF's. I could, of course, handle them as binary files, but then I would not see any differences in GitHub Desktop anymore. Is there a way to still treat them as text files, while preserving mixed line feeds (LF and CRLF)?
Any hint is greatly appreciated!
Upvotes: 2
Views: 969
Reputation: 488253
I don't use Windows, and therefore hesitate a bit to make recommendations, but I can describe the various mechanisms here, and make at least a few. :-)
Operations that transform file data, including anything that modifies line endings, apply (in general—there are some specific exceptions) at the time the data are extracted from the repository into the work-tree—that's basically git checkout
, but see notes towards the end—or added into the repository, which is basically git add
.
In order to transform file data, Git must know which files get transformed and what transforms to apply. Git has to classify each file to decide what to do.
Some files are clearly binary, some are almost certainly text, and some are quite ambiguous. Git will guess if it has to. You can (and I guess, perhaps, used to have to?) tell it to guess by setting core.autocrlf=true
or core.autocrlf=input
, but see the next paragraph.
If you have a .gitattributes
file, you can tell Git about files based on their path names, e.g., that *.txt
should always be treated as text files and *.bin
files should never be treated as text files. This gives you much finer control, because not only can you match based on path names like this, you can also write any of these:
*.ex1 text # definitely text
*.ex2 -text # definitely not text
*.ex3 text=auto # please guess for me based on file contents
# don't mention *.ex4: check core.autocrlf to decide whether to guess
Just based on this part, I would suggest that core.autocrlf
is never good to use, because guessing seems suspect in the first place. At least with text=auto
you have an obvious place requesting the guessing, though.
Independent of the guess-or-definite, you can list eol=crlf
or eol=lf
after a path. This enables conversion, i.e., the file is treated as if it were text when it comes to deciding whether to mess with line endings on extraction (git checkout
) and insertion (git add
). What winds up in the work-tree is either CRLF or LF-only. In either case CRLF in the work-tree is replaced with LF-only during git add
. I suspect, but have not tested, that this does not affect git diff
.
(The old crlf
, -crlf
, and crlf=input
settings should no doubt no longer be used, but if you do use them, they act as described in the gitattributes
documentation.)
Now, the obvious problem you've highlighted is that using -text
to mark a file as "never touch with autocrlf or other guessed-at transformations" interacts with git diff
, because git diff
also has to guess whether a file is text, before producing a diff. Here, we can go back to the gitattributes
documentation, where we find that path names can have a diff
attribute:
*.ex5 -text diff # not text for crlf treatment, but text for diff
*.ex6 -text -diff # not text for either one
*.ex7 text -diff # definitely text for crlf, but binary for diff
*.ex8 diff=my-diff-driver # use my diff driver; no opinion about text
Leaving out diff
entirely makes Git guess, just as it does with crlf treatment.
Note that path names in .gitattributes
need not be patterns: you can list:
path/to/some/file -text
path/to/another/file text
in case Git guesses wrong about some files.
I have not yet mentioned core.safecrlf
, but I think the discussion in the git config
documentation is pretty adequate here. It consists of a bunch of special tests run during various commands, doing conversions in both directions a bit early, with the final checkout stage going to a temporary file that is immediately thrown out, just to see if files that are in the work-tree right now would stay the way they are right now. That is, if you did a git add path; git commit -m dummy; rm path; git checkout -- path
right now, would the file in path change contents? If so, the conversion is not "safe".
Last, I should mention a few more special cases. Conversions (both line endings and smudge filters) are done any time the file comes out of the index; this includes the git checkout-index
command. They can also be done on purpose during operations that bypass the index: git cat-file
, by adding --textconv
or --path=
or --filters
; git show
, with --textconv
, although the details vary a bit based on specific Git version (many of these options are not in older versions of Git). Similarly, conversions (line endings and clean filters) are done any time the file goes into the index, but can also be done or suppressed in git hash-object
, using --path
or --no-filters
.
Upvotes: 2
Reputation: 3284
Unsetting text
for .mm
files will stop git from doing crlf
conversions on them but it won't start treating them as binary files so git-diff and other features will still behave correctly.
* text=auto
*.mm -text
That should solve your second question. However, because git won't be enforcing the line endings for .mm
files it likely will cause some headaches when you go public and contributors start modifying them on OS-X and Linux. If you can describe the rules for .mm
line endings maybe the config can be tweaked or perhaps a commit-hook can help you enforce it, other than that I don't see how you can solve both your first and second questions at the same time.
Upvotes: 1