Reputation: 365
I'm trying to convert JSON data to YAML format but getting an unexpected YAML output
Used online tools to convert JSON to YAML which gives as expected YAML output. But when same JSON used in the below Python code, getting an unexpected different result.
import yaml
job_template = [
{
"job-template": {
"name": "{name}_job",
"description": "job description",
"project-type": "multibranch",
"number-to-keep": 30,
"days-to-keep": 30,
"scm": [
{
"git": {
"url": "{git_url}"
}
}
]
}
}
]
yaml.dump(job_template, open("job_template.yaml", "w"))
Expecting below YAML data:
- job-template:
name: "{name}_job"
description: job description
project-type: multibranch
number-to-keep: 30
days-to-keep: 30
scm:
- git:
url: "{git_url}"
Getting below YAML format:
- job-template:
days-to-keep: 30
description: job description
name: '{name}_job'
number-to-keep: 30
project-type: multibranch
scm:
- git: {url: '{git_url}'}
Upvotes: 1
Views: 1626
Reputation: 651
The change of ordering in PyYAML is an impediment to round-trip edits to YAML files and a number of other parsers have sought to fix that.
One worth looking at is Ruamel.yaml which says on its overview page:
block style and key ordering are kept, so you can diff the round-tripped source
A code example provided by the author demonstrates this:
import sys
import ruamel.yaml as yaml
yaml_str = """\
3: abc
conf:
10: def
3: gij # h is missing
more:
- what
- else
"""
data = yaml.load(yaml_str, Loader=yaml.RoundTripLoader)
data['conf'][10] = 'klm'
data['conf'][3] = 'jig'
yaml.dump(data, sys.stdout, Dumper=yaml.RoundTripDumper)
will give you:
3: abc
conf:
10: klm
3: jig # h is missing
more:
- what
- else
This is more fully discussed here. It is described as a drop-in replacement for PyYAML so should be easy to experiment with in your environment.
Upvotes: 0
Reputation: 76912
First all you should just leave your job template in a JSON file, e.g input.json
.:
[
{
"job-template": {
"name": "{name}_job",
"description": "job description",
"project-type": "multibranch",
"number-to-keep": 30,
"days-to-keep": 30,
"scm": [
{
"git": {
"url": "{git_url}"
}
}
]
}
}
]
That way you can more easily adapt your script to process different files. And doing so also guarantees that the keys in your JSON objects are ordered, something not guaranteed when you include the JSON as dicts & lists in your code, at least not for all current versions of Python
Then because YAML 1.2 (spec issued in 2009) is a superset of
YAML, you can just use a YAML 1.2 library that preserves key order
when loading-dumping to convert this to the format you want. Since
PyYAML is still stuck at the 2005 issued YAML 1.1 specification, you
cannot use that, but you can use ruamel.yaml
(disclaimer I am the
author of that package).
The only "problem" is that ruamel.yaml
will also preserve the
(flow) style on your input. That is exactly what you don't want.
So you have to recursively walk over the data-structure and change the attribute containing that information:
import sys
import ruamel.yaml
def block_style(d):
if isinstance(d, dict):
d.fa.set_block_style()
for key, value in d. items():
try:
if '{' in value:
d[key] = ruamel.yaml.scalarstring.DoubleQuotedScalarString(value)
except TypeError:
pass
block_style(value)
elif isinstance(d, list):
d.fa.set_block_style()
for elem in d:
block_style(elem)
yaml = ruamel.yaml.YAML()
with open('input.json') as fp:
data = yaml.load(fp)
block_style(data)
yaml.dump(data, sys.stdout)
which gives:
- job-template:
name: "{name}_job"
description: job description
project-type: multibranch
number-to-keep: 30
days-to-keep: 30
scm:
- git:
url: "{git_url}"
The above works equally well for Python2 and Python3
The extra code testing for '{'
is to enforce double quotes around the strings that cannot be represented as plain scalars. By default ruamel.yaml
would use single quoted scalars if the extra escape sequences available in YAML double quoted scalars are not needed to represent the string.
Upvotes: -1
Reputation: 149185
The problem is in the Python code: a dict
is an unordered container. pprint
just gives the same order of your yaml output:
>>> pprint.pprint(job_template)
[{'job-template': {'days-to-keep': 30,
'description': 'job description',
'name': '{name}_job',
'number-to-keep': 30,
'project-type': 'multibranch',
'scm': [{'git': {'url': '{git_url}'}}]}}]
If the question was about the style of the representation for the last level dict {"url": "{git_url}"}
, the answer has been given by @Rakesh
Upvotes: 1
Reputation: 82805
Use default_flow_style=False
Ex:
import yaml
job_template = [
{
"job-template": {
"name": "{name}_job",
"description": "job description",
"project-type": "multibranch",
"number-to-keep": 30,
"days-to-keep": 30,
"scm": [
{
"git": {
"url": "{git_url}"
}
}
]
}
}
]
yaml.dump(job_template, open("job_template.yaml", "w"), default_flow_style=False)
Upvotes: 2