JohnSpeeks
JohnSpeeks

Reputation: 750

Custom line-endings in git (other than LF and CR+LF)

I was hired as a consultant to work with a terrible in-house DSL used by a large corporation.

I say terrible because instead of carriage returns or linefeeds to end each line of code, lines of code are separated with the five-character ASCII string <EOL>. These files are thousands of "lines" long. Any embedded carriage returns or linefeeds tend to crash their interpreter.

I cannot change their interpreter or language, but I need to work with a massive (>100 MB) codebase written in this language.

Before making any changes to this code, I want to put it into a git repository to track it. Is there a way to tell git that the string <EOL> represents an end-of-line, much like you can specify LF or CR+LF with core.eol=lf? For example, core.eol="<EOL>". If so, this would make my life rather easier in two ways:

  1. It would make merges and diffs work intelligently; git would know where the "lines" are.
  2. I could (for example) check in the original code with <EOL> as the line ending, then check it out on another machine with core.eol=lf set, and git would convert back and forth automatically. (I could use a regular text editor and regular tools!)

I do recognize that this is a niche, edge case. I also understand I could add an intermediate processing step to convert back and forth before interacting with git, but I want to avoid that unless absolutely necessary, as I'd prefer to import their existing codebase directly into git without pre-processing it first.

If this feature is not available, I might even prefer creating a custom version of git to adding an extra processing step, so if anyone knows what complexities might be involved in that, I'd be interested in learning about those.

Upvotes: 2

Views: 93

Answers (2)

ephemient
ephemient

Reputation: 204758

This custom filter setup will result in *.dsl files containing <EOL> in Git storage, but \n when checked out in your working directory. Tools such as git diff will operate on the checked-out versions (e.g. \n). Is that what you want?

~/.gitconfig or .git/config

[filter "crazy-eol"]
    clean = awk 'BEGIN{ORS="<EOL>"}1'
    smudge = awk 'BEGIN{RS="<EOL>"}1'
[diff "crazy-eol"]
    textconv = awk 'BEGIN{RS="<EOL>"}1'

.gitattributes or .git/info/attributes

*.dsl filter=crazy-eol diff=crazy-eol

Upvotes: 4

torek
torek

Reputation: 488193

There is a way to do this. It's not convenient at all, and it runs the risk of making un-invert-able changes if the literal string <EOL> really does appear inside a line (although given your description of the DSL it seems like this cannot happen).

You cannot do it with the core.eol settings, though. You will need to use smudge and clean filters. Look at the description in the gitattributes documentation. Your two filters will convert <EOL> to line-feed and vice versa. This is, in fact, exactly what the core.eol and core.autocrlf and text conversion filters do: they replace \r\n with \n in one direction or another, just as you would replace <EOL> with \n in one direction or another. In fact, if you look a bit further down in the documentation, in the "Interaction between checkin/checkout attributes" section, you will see that Git simply has a text filter that acts like a clean and/or smudge filter, as part of a pipeline.

Before you do any of this, consider ...

Before you bother with this, consider just doing a one-time pass of your own. Once you have the file in "normal" form, you can Git-ize those. You can always run your own sanitizer before working on these files. Then, once you have the files ready to go, you run them through a "insanitizer" to go back to the crazy <EOL> format, all outside Git entirely.

I think this (external sanitizer/insanitizer) will be easier to work with, really.

Upvotes: 2

Related Questions