Guven Degirmenci
Guven Degirmenci

Reputation: 712

Git, how to ignore switched lines?

I have a config file that changes when I start the server but the only change is the switching lines. It does not affect the content of the file and doesn't matter where the lines are as the server rewrites the data where it got them. How can I prevent that?

I thought of having a static folder and put my config files here and then before I commit the changes I just copy them but I don't know if I can do that with git automatically.

Example of what I am talking about:

Before I run the server:

    frosted-ice:
      enabled: true
      delay:
        min: 20
        max: 40
    lootables:
      auto-replenish: false
      restrict-player-reloot: true
      reset-seed-on-fill: true
      max-refills: -1
      refresh-min: 12h
      refresh-max: 2d

After I run the server:

    lootables:
      auto-replenish: false
      restrict-player-reloot: true
      refresh-min: 12h
      refresh-max: 2d
      reset-seed-on-fill: true
      max-refills: -1
    frosted-ice:
      enabled: true
      delay:
        min: 20
        max: 40

Upvotes: 1

Views: 219

Answers (1)

torek
torek

Reputation: 488253

You can consider inserting a clean filter into your commit process. You will need to write the clean filter itself, then add a filter-driver configuration to your Git configuration—to tell Git how to invoke the clean filter—and add a .gitattributes file to mark this particular file as requiring a pass through the clean filter.

Background

Git stores commits, and commits hold content—file data for a collection of files, one collection-of-files per commit—but for the most part, Git has no idea what that content means. And: each commit stores a full snapshot of every file—so if the file's meaning is the same, and you're making a new commit anyway, it's relatively harmless to just make a new snapshot with the rearranged lines.

I say relatively harmless because there are two obvious drawbacks:

  • When you, a human, compare the previous commit (the parent of the new commit) to the subsequent (new, or child) commit, you'll see that config.yml, or whatever this file's name is, has changed. That change is just noise, but you'll see it, because Git doesn't interpret the content, it just says: hey, this file changed.

  • Git has clever techniques to keep the repository's overall size small, despite the fact that every commit stores a complete copy of every file. The first technique is radically simple: Git hashes each content into what Git calls a blob object that is retrieved by its hash ID, the same way that Git retrieves commits by their unique hash IDs. This hashing is exquisitely sensitive to the value and position of every byte (or bit, really) of input data. So a file that exactly matches a previous copy of the file comes up with the same hash ID, but a file that is altered in any way—including just a rearrangement of some blocks within it—does not.

In other words, this file-content-rearranging defeats Git's first and most effective means of compressing away duplicate copies of files.

You say:

[It] doesn't matter where the lines are as the server rewrites the data where it got them. How can I prevent that?

and in a comment you add that:

the data [is encoded as] yaml [and represents an unordered map]

Swapping lines around randomly within a YAML file would change the stored data, but swapping the position of individual key-value pairs (which may or may not span one or more lines) within an unordered mapping does indeed leave the overall mapping unchanged (since the mapping itself is unordered).

The problem here is that Git has no idea that this is an unordered mapping represented as YAML lines. You may know it, but Git doesn't—and there's no easy way to tell Git, as Git doesn't know YAML from JSON, nor either from a hole in a file.

In the end, then, you have only three options:

  1. Ignore the noise and lack-of-compression. That is, put up with this. It won't fail, it's just annoying.

  2. Teach whatever rewrites this file in place to use a stable order, despite the file containing an unordered mapping. That is, in order for the data to be expressed as lines in YAML, whatever is expressing the mapping has to order the mapping because lines have order. The program writing this YAML file can just sort the mapping before writing. See What is stability in sorting algorithms and why is it important?

  3. Insert your own sorting.

Approach 1 is the simplest of all, but has the drawback of leaving you annoyed. (Disk space is cheap, and Git's secondary compression techniques may eventually compress away most of this stuff anyway, so the saves-space argument may be weak, but the annoys-the-human argument is pretty strong.)

I personally view the second approach as the best, but if you don't control this other program, you might not be able to fix it.

The third approach means that you must write your own program. If you have access to YAML readers and writers and can easily construct, then sort and write-back, an ordered mapping from the YAML file, consider doing that.

Using a clean filter

To use a clean filter, you need to provide a name for it. You could call yours sort-yaml-dictionary or something along those lines. These lines would then go into some .git/config or $HOME/.gitconfig or similar file:

[filter "sort-yaml-dictionary"]
    clean = the-program-you-wrote

You don't have to define a smudge filter here as well, but you can if you like, using a program that simply copies its input back to its output, such as cat. See the gitattributes documentation for additional details here.

Then, in your .giattributes file, you would include the line:

config.yml  filter=sort-yaml-dictionary

This tells Git:

  • Every time I run git add on config.yml, don't just copy the work-tree file into Git's index. Instead, read the file. Send its data to the clean filter for the sort-yaml-dictionary driver, as input to that filter. Read the output from that filter back. Store the resulting data in the index, under the file's original name—in this case, config.yml.

Since your program reads, then sorts (into a stable dictionary key-value order), then writes back, the configuration, the result will be that what goes into commits is a sorted version of the config.yml file, regardless of what's in the work-tree version.

It's up to you whether to try to have your program also sort the work-tree copy in-place (i.e., read and then rewrite the work-tree copy), but if you do, make sure you do this either completely before or completely after the file has been copied to the index. Otherwise you run the risk of writing to the work-tree file while Git is trying to read the work-tree file to send the data to your filter program.

Do not filter the work-tree file. Filter the input to your program. This input may not actually match the work-tree file's content. (The clean filter can be invoked via git merge, on content that is not in the work-tree at all.)

Upvotes: 1

Related Questions