Reputation: 356
I'm trying to split a YAML file in two different files such as below:
def yaml_loader():
try:
with open("test.yaml", "r") as stream:
data = yaml.load(stream)
for workload in data:
with open(workload['workload']['name'] + '.yaml', 'a') as outfile:
yaml.dump(workload, outfile)
except yaml.YAMLError as out:
print(out)
YAML:
- workload:
name: c1
param:
p1: 1
p2: 2
- workload:
name: c2
param:
p1: 30
p2: 200
But in the output both files are missing -
for YAML syntax.
workload:
name: c1
param:
p1: 1
p2: 2
How can I fix this?
Upvotes: 0
Views: 5194
Reputation: 79
as @Ignacio Vazquez-Abrams said:
def yaml_loader():
try:
with open("test.yaml", "r") as stream:
data = yaml.load(stream, Loader=yaml.FullLoader)
for workload in data:
with open(workload['workload']['name'] + '.yaml', 'a') as outfile:
# if you put the variable "workload" in a list, you get the '-' in the yaml, as it denotes a list item.
yaml.dump([workload], outfile)
except yaml.YAMLError as out:
print(out)
in a yaml,
"- item: " denotes a list item, so without putting your output in a list, you wont get the "-"
Upvotes: 2
Reputation: 76732
Your code is not working as expected, but also not as you indicate.
On the first run you will get this output in c1.yaml
:
workload:
name: c1
param: {p1: 1, p2: 2}
because by default the load()
and dump()
(in both ruamel.yaml
and PyYAML) will default to flow style on collection elements (mapping, sequence) that don't contain other collection elements.
Additionally if you ever invoke that routine a second time your c1.yaml
file will contain:
workload:
name: c1
param: {p1: 1, p2: 2}
workload:
name: c1
param: {p1: 1, p2: 2}
because you open for appending.
Using load()
for this kind of splitting a file might also not be a good idea if you have no control over the source of the file and it possible contains YAML tags.
The above problems are on top of the missing toplevel sequence element that can be easily solved as @Ignacio Vazquez-Abrams indicated.
There is also no need to put the for statement within first with
statement and thereby delaying closing the input file. data
is not an iterator.
I suggest using ruamel.yaml
¹ round-trip loading/dumping to do this right. Apart from preserving block resp. flow style, it also supports YAML 1.2, keeps your comments, preserves the order of the mapping keys, and can keep quotes around scalar strings in the source:
import ruamel.yaml as yaml
def yaml_loader():
try:
with open("test.yaml", "r") as stream:
data = yaml.round_trip_load(stream, preserve_quotes=True)
for workload in data:
with open(workload['workload']['name'] + '.yaml', 'w') as outfile:
yaml.round_trip_dump([workload], outfile)
except yaml.YAMLError as out:
print(out)
yaml_loader()
will give you:
- workload:
name: c1
param:
p1: 1
p2: 2
¹ This was done using ruamel.yaml of which I am the author.
Upvotes: 0