Reputation: 37
I am using ruamel.yaml to edit YAML files and dump them. I need help on how to keep the structure the same as the original file,
I have a YAML file which has the content below, however, this content is not being modified, but when I load and dump it after editing the structure of this content changes
parameters: {
"provision_pipeline": "provision-integrations",
"enable_sfcd_ds_argo_operator": "false",
"clustermanagement_helm_values_path": "sam/sam-helm-charts/kube-node-recycler-0.0.4-273/values.nodepool.yaml"
}
However, after I dump the structure of this is changed to the format below:
parameters: {"provision_pipeline": "provision-integrations", "enable_sfcd_ds_argo_operator": "false",
"clustermanagement_helm_values_path": "sam/sam-helm-charts/kube-node-recycler-0.0.4-273/values.nodepool.yaml"}
Code:
def addTargetToBaseIntegFileAndUpdate(deploymentTarget, fi, env, samvmf_repo, folder, pipelineversionintegration, basefile):
yamldata = OrderedDict()
ryaml = rumel.yaml.YAML()
ryaml.preserve_quotes = True
ryaml.default_flow_style = False
ryaml.indent(mapping=2)
with open(basefile, "r") as file:
yamldata = ryaml.load(file)
deploymentTargets = yamldata["targets"]["stagger_groups"]
target = ""
doesFIExist = False
fi_index = 0
for index, sg in enumerate(deploymentTargets):
if sg["name"] == env.lower():
target = deploymentTargets[index]
for i, fi_item in enumerate(target["falcon_instances"]):
if fi_item["name"] == fi.lower():
fi_index = i
doesFIExist = True
break
if doesFIExist:
yamldata["targets"]["stagger_groups"][index]["f_instances"][fi_index]["f_domains"].append(deploymentTarget["f_instances"][0]["f_domains"][0])
else:
yamldata["targets"]["stagger_groups"][index]["f_instances"].append(deploymentTarget["f_instances"][0])
break
with open(basefile, "w") as fileobj:
ryaml.dump(yamldata, fileobj)
Upvotes: 0
Views: 3210
Reputation: 76599
ruamel.yaml
doesn't preserve newlines between flow style mapping elements. The only thing
affecting these is yaml.width
so you get a wrap on lines that are getting to long.
E.g. with your input, if you set the width to 40, you'll get:
parameters: {"provision_pipeline": "provision-integrations",
"enable_sfcd_ds_argo_operator": "false",
"clustermanagement_helm_values_path": "sam/sam-helm-charts/kube-node-recycler-0.0.4-273/values.nodepool.yaml"}
But there is no control that gets you the first key-value pair on a new line, nor that you get a closing curly brace on a line of its own.
Your addition ryaml.default_flow_style = False
only affects completely new dicts and list that you add to
the data structure.
You should consider switching to block style and drop all non-essential quotes, that makes the YAML both less verbose and more readable. For the program that loads the data this makes no difference, and conversion is easily done by loading in normal safe mode (which does not set block/flow-style information on the loaded data):
import sys
import pathlib
import ruamel.yaml
basefile = pathlib.Path('input.yaml')
data = ruamel.yaml.YAML(typ='safe').load(basefile)
yaml = ruamel.yaml.YAML()
yaml.dump(data, sys.stdout)
which gives:
parameters:
provision_pipeline: provision-integrations
enable_sfcd_ds_argo_operator: 'false'
clustermanagement_helm_values_path: sam/sam-helm-charts/kube-node-recycler-0.0.4-273/values.nodepool.yaml
The string scalar 'false'
needs to get quoted in order not to be confused with the boolean false
.
If the above improvement is unacceptable, e.g. if further processing is done with something else than a full YAML parser, you can post-process the output:
import sys
import pathlib
import ruamel.yaml
basefile = pathlib.Path('input.yaml')
def splitflowmap(s):
res = []
for line in s.splitlines():
if ': {' in line and line[-1] == '}':
start, rest = line.split(': {', 1)
start = start + ': {'
indent = ' ' # two spaces more than the start
for idx, ch in enumerate(start):
if ch != ' ':
break
indent += ' '
res.append(start)
rest = rest[:-1] # cut of }\n
for x in rest.split(', '): # if you always have quotes it is safer to split on '", "'
res.append(f'{indent}{x},')
res[-1] = res[-1][:-1] # delete trailing comma
res.append(f'{indent[2:]}}}') # re-add the cut of }\n on a line of its own
continue
res.append(line)
return '\n'.join(res) + '\n'
yaml = ruamel.yaml.YAML()
yaml.preserve_quotes = True
yaml.width = 2**16
data = yaml.load(basefile)
yaml.dump(data, sys.stdout, transform=splitflowmap)
which gives:
parameters: {
"provision_pipeline": "provision-integrations",
"enable_sfcd_ds_argo_operator": "false",
"clustermanagement_helm_values_path": "sam/sam-helm-charts/kube-node-recycler-0.0.4-273/values.nodepool.yaml"
}
Upvotes: 3
Reputation: 3518
As Nick Bailey pointed out in the comments, this is a stylistic change, not a structural one. That is, the data is the same, it's just presented differently.
Now, as for what that style is, YAML has two styles of presenting data structures:
Block style: Each key/value starts on a new line, and both lists and dictionaries (mappings) are started and stopped via indentation. This is usually the preferred style, as it is more human-readable.
Flow style: Lists/mappings are started and ended by brackets, and multiple key/values are separated by commas, as in JSON. Line breaks aren't required between key/value pairs, but also not disallowed. This format is more commonly used for smaller, simpler data structures, especially on a single line, since it can save space.
The original YAML you've shown is one key/value pair within a larger block-style mapping, but the value itself isn't block style; it's just flow style with extra line breaks added. I think you probably want this instead, fully in block style:
test:
parameters:
"provision_pipeline": "provision-integrations"
"enable_sfcd_ds_argo_operator": "false"
"clustermanagement_helm_values_path": "sam/sam-helm-charts/kube-node-recycler-0.0.4-273/values.nodepool.yaml"
ruamel.yaml, in its default (roundtrip) mode, preserves flow or block style, whichever you give it, but I don't know of a way to make it remember specific line breaks that you've added within a flow section. See this comparison:
import sys
from ruamel.yaml import YAML
yaml_string_1 = """\
test:
parameters: {
"provision_pipeline": "provision-integrations",
"enable_sfcd_ds_argo_operator": "false",
"clustermanagement_helm_values_path": "sam/sam-helm-charts/kube-node-recycler-0.0.4-273/values.nodepool.yaml"
}
"""
yaml_string_2 = """\
test:
parameters:
"provision_pipeline": "provision-integrations"
"enable_sfcd_ds_argo_operator": "false"
"clustermanagement_helm_values_path": "sam/sam-helm-charts/kube-node-recycler-0.0.4-273/values.nodepool.yaml"
"""
yaml = YAML()
for yaml_string in [yaml_string_1, yaml_string_2]:
output = yaml.load(yaml_string)
yaml.dump(output, sys.stdout)
print()
Output:
test:
parameters: {provision_pipeline: provision-integrations, enable_sfcd_ds_argo_operator: 'false',
clustermanagement_helm_values_path: sam/sam-helm-charts/kube-node-recycler-0.0.4-273/values.nodepool.yaml}
test:
parameters:
provision_pipeline: provision-integrations
enable_sfcd_ds_argo_operator: 'false'
clustermanagement_helm_values_path: sam/sam-helm-charts/kube-node-recycler-0.0.4-273/values.nodepool.yaml
You can also, of course, add preserve_quotes
and whatever other options you need.
Upvotes: 3