hossein
hossein

Reputation: 356

Split a YAML file in two separate files

I'm trying to split a YAML file in two different files such as below:

def yaml_loader():
  try:
    with open("test.yaml", "r") as stream:
       data = yaml.load(stream)
       for workload in data:
         with open(workload['workload']['name'] + '.yaml', 'a') as outfile:
              yaml.dump(workload, outfile)
  except yaml.YAMLError as out:
    print(out)

YAML:

- workload:
     name: c1
     param:
       p1: 1
       p2: 2

- workload:
    name: c2
    param:
      p1: 30
      p2: 200

But in the output both files are missing - for YAML syntax.

workload:
   name: c1
   param:
     p1: 1
     p2: 2

How can I fix this?

Upvotes: 0

Views: 5194

Answers (2)

Max Chesterfield
Max Chesterfield

Reputation: 79

as @Ignacio Vazquez-Abrams said:

def yaml_loader():
  try:
    with open("test.yaml", "r") as stream:
       data = yaml.load(stream, Loader=yaml.FullLoader)
       for workload in data:
         with open(workload['workload']['name'] + '.yaml', 'a') as outfile:
              # if you put the variable "workload" in a list, you get the '-' in the yaml, as it denotes a list item.
              yaml.dump([workload], outfile)
  except yaml.YAMLError as out:
    print(out)

in a yaml,

"- item: " denotes a list item, so without putting your output in a list, you wont get the "-"

Upvotes: 2

Anthon
Anthon

Reputation: 76732

Your code is not working as expected, but also not as you indicate.

On the first run you will get this output in c1.yaml:

workload:
  name: c1
  param: {p1: 1, p2: 2}

because by default the load() and dump() (in both ruamel.yaml and PyYAML) will default to flow style on collection elements (mapping, sequence) that don't contain other collection elements.

Additionally if you ever invoke that routine a second time your c1.yaml file will contain:

workload:
  name: c1
  param: {p1: 1, p2: 2}
workload:
  name: c1
  param: {p1: 1, p2: 2}

because you open for appending.

Using load() for this kind of splitting a file might also not be a good idea if you have no control over the source of the file and it possible contains YAML tags.

The above problems are on top of the missing toplevel sequence element that can be easily solved as @Ignacio Vazquez-Abrams indicated.

There is also no need to put the for statement within first with statement and thereby delaying closing the input file. data is not an iterator.

I suggest using ruamel.yaml¹ round-trip loading/dumping to do this right. Apart from preserving block resp. flow style, it also supports YAML 1.2, keeps your comments, preserves the order of the mapping keys, and can keep quotes around scalar strings in the source:

import ruamel.yaml as yaml

def yaml_loader():
    try:
        with open("test.yaml", "r") as stream:
            data = yaml.round_trip_load(stream, preserve_quotes=True)
        for workload in data:
            with open(workload['workload']['name'] + '.yaml', 'w') as outfile:
                yaml.round_trip_dump([workload], outfile)
    except yaml.YAMLError as out:
        print(out)

yaml_loader()

will give you:

- workload:
    name: c1
    param:
      p1: 1
      p2: 2

¹ This was done using ruamel.yaml of which I am the author.

Upvotes: 0

Related Questions