mskm
mskm

Reputation: 37

ruamel.yaml: How to preserve structure of dict in YAML

I am using ruamel.yaml to edit YAML files and dump them. I need help on how to keep the structure the same as the original file,

I have a YAML file which has the content below, however, this content is not being modified, but when I load and dump it after editing the structure of this content changes

    parameters: {
      "provision_pipeline": "provision-integrations",
      "enable_sfcd_ds_argo_operator": "false",
      "clustermanagement_helm_values_path": "sam/sam-helm-charts/kube-node-recycler-0.0.4-273/values.nodepool.yaml"
    }

However, after I dump the structure of this is changed to the format below:

    parameters: {"provision_pipeline": "provision-integrations", "enable_sfcd_ds_argo_operator": "false",
      "clustermanagement_helm_values_path": "sam/sam-helm-charts/kube-node-recycler-0.0.4-273/values.nodepool.yaml"}

Code:

def addTargetToBaseIntegFileAndUpdate(deploymentTarget, fi, env, samvmf_repo, folder, pipelineversionintegration, basefile):
    yamldata = OrderedDict()
    ryaml = rumel.yaml.YAML()
    ryaml.preserve_quotes = True
    ryaml.default_flow_style = False
    ryaml.indent(mapping=2)
        
    with open(basefile, "r") as file:
        yamldata = ryaml.load(file)
        deploymentTargets = yamldata["targets"]["stagger_groups"]
        target = ""
        doesFIExist = False
        fi_index = 0

        for index, sg in enumerate(deploymentTargets):
            if sg["name"] == env.lower():
                target = deploymentTargets[index]
                for i, fi_item in enumerate(target["falcon_instances"]):
                    if fi_item["name"] == fi.lower():
                        fi_index = i
                        doesFIExist = True
                        break
                if doesFIExist:
                    yamldata["targets"]["stagger_groups"][index]["f_instances"][fi_index]["f_domains"].append(deploymentTarget["f_instances"][0]["f_domains"][0])
                else:
                    yamldata["targets"]["stagger_groups"][index]["f_instances"].append(deploymentTarget["f_instances"][0])
                break

    with open(basefile, "w") as fileobj:
        ryaml.dump(yamldata, fileobj)

Upvotes: 0

Views: 3210

Answers (2)

Anthon
Anthon

Reputation: 76599

ruamel.yaml doesn't preserve newlines between flow style mapping elements. The only thing affecting these is yaml.width so you get a wrap on lines that are getting to long. E.g. with your input, if you set the width to 40, you'll get:

parameters: {"provision_pipeline": "provision-integrations",
  "enable_sfcd_ds_argo_operator": "false",
  "clustermanagement_helm_values_path": "sam/sam-helm-charts/kube-node-recycler-0.0.4-273/values.nodepool.yaml"}

But there is no control that gets you the first key-value pair on a new line, nor that you get a closing curly brace on a line of its own.

Your addition ryaml.default_flow_style = False only affects completely new dicts and list that you add to the data structure.

You should consider switching to block style and drop all non-essential quotes, that makes the YAML both less verbose and more readable. For the program that loads the data this makes no difference, and conversion is easily done by loading in normal safe mode (which does not set block/flow-style information on the loaded data):

import sys
import pathlib
import ruamel.yaml

basefile = pathlib.Path('input.yaml')

data = ruamel.yaml.YAML(typ='safe').load(basefile)
yaml = ruamel.yaml.YAML()
yaml.dump(data, sys.stdout)

which gives:

parameters:
  provision_pipeline: provision-integrations
  enable_sfcd_ds_argo_operator: 'false'
  clustermanagement_helm_values_path: sam/sam-helm-charts/kube-node-recycler-0.0.4-273/values.nodepool.yaml

The string scalar 'false' needs to get quoted in order not to be confused with the boolean false.

If the above improvement is unacceptable, e.g. if further processing is done with something else than a full YAML parser, you can post-process the output:

import sys
import pathlib
import ruamel.yaml

basefile = pathlib.Path('input.yaml')

def splitflowmap(s):
    res = []
    for line in s.splitlines():
        if ': {' in line and line[-1] == '}':
            start, rest = line.split(': {', 1)
            start = start + ': {'
            indent = '  '  # two spaces more than the start
            for idx, ch in enumerate(start):
                if ch != ' ':
                    break
                indent += ' '
            res.append(start)
            rest = rest[:-1]  # cut of }\n
            for x in rest.split(', '):  # if you always have quotes it is safer to split on '", "'
                res.append(f'{indent}{x},')
            res[-1] = res[-1][:-1]  # delete trailing comma
            res.append(f'{indent[2:]}}}')  # re-add the cut of }\n on a line of its own
            continue
        res.append(line)
    return '\n'.join(res) + '\n'

yaml = ruamel.yaml.YAML()
yaml.preserve_quotes = True
yaml.width = 2**16
data = yaml.load(basefile)
yaml.dump(data, sys.stdout, transform=splitflowmap)

which gives:

parameters: {
  "provision_pipeline": "provision-integrations",
  "enable_sfcd_ds_argo_operator": "false",
  "clustermanagement_helm_values_path": "sam/sam-helm-charts/kube-node-recycler-0.0.4-273/values.nodepool.yaml"
}

Upvotes: 3

CrazyChucky
CrazyChucky

Reputation: 3518

As Nick Bailey pointed out in the comments, this is a stylistic change, not a structural one. That is, the data is the same, it's just presented differently.

Now, as for what that style is, YAML has two styles of presenting data structures:

  • Block style: Each key/value starts on a new line, and both lists and dictionaries (mappings) are started and stopped via indentation. This is usually the preferred style, as it is more human-readable.

  • Flow style: Lists/mappings are started and ended by brackets, and multiple key/values are separated by commas, as in JSON. Line breaks aren't required between key/value pairs, but also not disallowed. This format is more commonly used for smaller, simpler data structures, especially on a single line, since it can save space.

The original YAML you've shown is one key/value pair within a larger block-style mapping, but the value itself isn't block style; it's just flow style with extra line breaks added. I think you probably want this instead, fully in block style:

test:
  parameters:
    "provision_pipeline": "provision-integrations"
    "enable_sfcd_ds_argo_operator": "false"
    "clustermanagement_helm_values_path": "sam/sam-helm-charts/kube-node-recycler-0.0.4-273/values.nodepool.yaml"

ruamel.yaml, in its default (roundtrip) mode, preserves flow or block style, whichever you give it, but I don't know of a way to make it remember specific line breaks that you've added within a flow section. See this comparison:

import sys
from ruamel.yaml import YAML

yaml_string_1 = """\
test:
  parameters: {
    "provision_pipeline": "provision-integrations",
    "enable_sfcd_ds_argo_operator": "false",
    "clustermanagement_helm_values_path": "sam/sam-helm-charts/kube-node-recycler-0.0.4-273/values.nodepool.yaml"
  }
"""
yaml_string_2 = """\
test:
  parameters:
    "provision_pipeline": "provision-integrations"
    "enable_sfcd_ds_argo_operator": "false"
    "clustermanagement_helm_values_path": "sam/sam-helm-charts/kube-node-recycler-0.0.4-273/values.nodepool.yaml"
"""

yaml = YAML()
for yaml_string in [yaml_string_1, yaml_string_2]:
    output = yaml.load(yaml_string)
    yaml.dump(output, sys.stdout)
    print()

Output:

test:
  parameters: {provision_pipeline: provision-integrations, enable_sfcd_ds_argo_operator: 'false',
    clustermanagement_helm_values_path: sam/sam-helm-charts/kube-node-recycler-0.0.4-273/values.nodepool.yaml}

test:
  parameters:
    provision_pipeline: provision-integrations
    enable_sfcd_ds_argo_operator: 'false'
    clustermanagement_helm_values_path: sam/sam-helm-charts/kube-node-recycler-0.0.4-273/values.nodepool.yaml

You can also, of course, add preserve_quotes and whatever other options you need.

Upvotes: 3

Related Questions