Alfred Völkl
Alfred Völkl

Reputation: 33

How to compress a YAML file using references in a script?

I am converting a json file to a YAML file with https://github.com/nodeca/js-yaml using safeDump

The outcome is like this

en:
  models: 
    errors:
      name: name not found
      url: bad url
  user: 
    errors:
      name: name not found
      url: bad url
  photo:
    errors:
      name: name not found
      url: bad url

but I want a script to compress with the references

en:
  models: 
    errors: &1
      name: name not found
      url: bad url
  user:
    errors: *1
  photo:
    errors: *1

Upvotes: 2

Views: 2022

Answers (3)

Anthon
Anthon

Reputation: 76722

What you want to do is "compressing" the JSON input to YAML with references for those mappings that have exactly the same key-value pairs. In order to achieve that you need to be able to find such matching mappings and one way to do that is by creating a lookup table based on the string representation of the mapping after sorting the keys.

Assuming this JSON input in input.json:

{
  "en": {
    "models": {
      "errors": {
        "name": "name not found",
        "url": "bad url"
      }
    },
    "user": {
      "errors": {
        "name": "name not found",
        "url": "bad url"
      }
    },
    "photo": {
      "errors": {
        "name": "name not found",
        "url": "bad url"
      }
    }
  }
}

You can convert it with this Python script to get:

import json
import sys
from pathlib import Path
import ruamel.yaml

in_file = Path('input.json')


def optmap(d, mappings=None):
    if mappings is None:
        mappings = {}
    if isinstance(d, dict):
        for k in d:
            v = d[k]
            sv = repr(v)
            ref = mappings.get(sv)
            if ref is not None:
                d[k] = ref
            else:
                mappings[sv] = v
                optmap(d[k], mappings)     
    elif isinstance(d, list):
        for idx, item in d:
            sitem = repr(item)
            ref = mappings.get(sitem)
            if ref is not None:
                d[idx] = sitem
            else:
                mappings[sitem] = item
                optmap(item, mappings)


data = json.load(in_file.open())
optmap(data)
yaml = ruamel.yaml.YAML()
yaml.serializer.ANCHOR_TEMPLATE = u'%d'
yaml.dump(data, sys.stdout)

which gives:

en:
  models: &1
    errors:
      name: name not found
      url: bad url
  user: *1
  photo: *1

The above will also make references to, and traverse, arrays in your JSON.

As you can see your output can be further "compressed" than you though it could be.


I am not fluent enough in JavaScript to have written this answer in that language (without investing too much effort and delivering some ugly code), but the OP obviously understood the intent of optmap() and implemented it in his answer

Upvotes: 0

Alfred Völkl
Alfred Völkl

Reputation: 33

Based on the Python script from Anthon https://stackoverflow.com/a/55808583/10103951

function buildRefsJson(inputJson, mappings = null) {
if (!mappings) {
    mappings = {}
}
if (typeof(inputJson) === 'object') {
    let value
    let stringValue
    let ref
    for (let key in inputJson) {
        value = inputJson[key]
        stringValue = JSON.stringify(value)
        ref = mappings[stringValue]
        if (ref) {
            inputJson[key] = ref
        } else {
            mappings[stringValue] = value
            buildRefsJson(inputJson[key], mappings)
        }
    }
}

I transformed it to JavaScript code. And it did the work! Also thanks to Niroj for helping

Upvotes: 1

Xie Guanglei
Xie Guanglei

Reputation: 456

Sadly there's no solution to convert JSON to YML with references as far as I know, cause there's no such 'references' rule for repeating nodes in JSON. As the spec says, YAML can therefore be viewed as a natural superset of JSON.

Upvotes: 0

Related Questions