Reputation: 750
I was hired as a consultant to work with a terrible in-house DSL used by a large corporation.
I say terrible because instead of carriage returns or linefeeds to end each line of code, lines of code are separated with the five-character ASCII string <EOL>
. These files are thousands of "lines" long. Any embedded carriage returns or linefeeds tend to crash their interpreter.
I cannot change their interpreter or language, but I need to work with a massive (>100 MB) codebase written in this language.
Before making any changes to this code, I want to put it into a git repository to track it. Is there a way to tell git that the string <EOL>
represents an end-of-line, much like you can specify LF
or CR+LF
with core.eol=lf
? For example, core.eol="<EOL>"
. If so, this would make my life rather easier in two ways:
<EOL>
as the line ending, then check it out on another machine with core.eol=lf
set, and git would convert back and forth automatically. (I could use a regular text editor and regular tools!)I do recognize that this is a niche, edge case. I also understand I could add an intermediate processing step to convert back and forth before interacting with git, but I want to avoid that unless absolutely necessary, as I'd prefer to import their existing codebase directly into git without pre-processing it first.
If this feature is not available, I might even prefer creating a custom version of git to adding an extra processing step, so if anyone knows what complexities might be involved in that, I'd be interested in learning about those.
Upvotes: 2
Views: 93
Reputation: 204758
This custom filter setup will result in *.dsl
files containing <EOL>
in Git storage, but \n
when checked out in your working directory. Tools such as git diff
will operate on the checked-out versions (e.g. \n
). Is that what you want?
~/.gitconfig
or .git/config
[filter "crazy-eol"]
clean = awk 'BEGIN{ORS="<EOL>"}1'
smudge = awk 'BEGIN{RS="<EOL>"}1'
[diff "crazy-eol"]
textconv = awk 'BEGIN{RS="<EOL>"}1'
.gitattributes
or .git/info/attributes
*.dsl filter=crazy-eol diff=crazy-eol
Upvotes: 4
Reputation: 488193
There is a way to do this. It's not convenient at all, and it runs the risk of making un-invert-able changes if the literal string <EOL>
really does appear inside a line (although given your description of the DSL it seems like this cannot happen).
You cannot do it with the core.eol
settings, though. You will need to use smudge and clean filters. Look at the description in the gitattributes documentation. Your two filters will convert <EOL>
to line-feed and vice versa. This is, in fact, exactly what the core.eol
and core.autocrlf
and text
conversion filters do: they replace \r\n
with \n
in one direction or another, just as you would replace <EOL>
with \n
in one direction or another. In fact, if you look a bit further down in the documentation, in the "Interaction between checkin/checkout attributes" section, you will see that Git simply has a text
filter that acts like a clean and/or smudge filter, as part of a pipeline.
Before you bother with this, consider just doing a one-time pass of your own. Once you have the file in "normal" form, you can Git-ize those. You can always run your own sanitizer before working on these files. Then, once you have the files ready to go, you run them through a "insanitizer" to go back to the crazy <EOL>
format, all outside Git entirely.
I think this (external sanitizer/insanitizer) will be easier to work with, really.
Upvotes: 2