Plindgren
Plindgren

Reputation: 67

Pyyaml - Using different styles for keys and integers and strings

--- 
"main": 
  "directory": 
    "options": 
      "directive": 'options'
      "item": 
        "options": 'Stuff OtherStuff MoreStuff'
  "directoryindex": 
    "item": 
      "directoryindex": 'stuff.htm otherstuff.htm morestuff.html'
  "fileetag": 
    "item": 
      "fileetag": 'Stuff'
  "keepalive": 
    "item": 
      "keepalive": 'Stuff'
  "keepalivetimeout": 
    "item": 
      "keepalivetimeout": 2

above is a YAML file which I need to parse, edit then dump. I have chosen to do so with pyyaml on python 2.7 (I need to use this). I have been able to parse and edit.

However, since the YAML has different styles for keys and different styles for strings and integers I cannot set a default style. I am now wondering how I can use pyyaml to dump different styles for the different types.

Below is what I do to parse and edit

infile = yaml.load(open('yamlfile'))

#Recursive function to loop through nested dictionary
def edit(d,keytoedit=None,newvalue=None):
  for key, value in d.iteritems():
    if isinstance(value, dict) and key == keytoedit and 'item' in value:
      value[value.iterkeys().next()] = {keytoedit:newvalue}
      edit(value,keytoedit=keytoedit,newvalue=newvalue)
    elif isinstance(value, dict) and keytoedit in value and 'item' not in value and key != 'main':
      value[keytoedit] = newvalue
      edit(value,keytoedit=keytoedit,newvalue=newvalue)
    elif isinstance(value, dict):
      edit(value,keytoedit=keytoedit,newvalue=newvalue)

outfile = file('outfile','w')
yaml.dump(infile, outfile,default_flow_style=False)

So, I am wondering how I can achieve that, if I use the default_style in yaml.dump all the types get the same style and I need to adhere to the original YAML files standard.

Can I somehow specify styles for specific types with pyyaml?

Edit: Here is what i get so far, the missing piece is the double qoutes on the keys and the single qoutes on the strings.

main:
  directory:
    options:
      directive: options
      item:
        options: Stuff OtherStuff MoreStuff
  directoryindex:
    item:
      directoryindex: stuff.html otherstuff.htm morestuff.html
  fileetag:
    item:
      fileetag: Stuff
  keepalive:
    item:
      keepalive: 'On'
  keepalivetimeout:
    item:
      keepalivetimeout: 2

Upvotes: 2

Views: 3865

Answers (2)

yopLa
yopLa

Reputation: 1

Use ruamel.yaml instead

it is better documented than pyyaml: https://pypi.org/project/ruamel.yaml/

Example of the template.yaml file I want to read:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: >
  lambda_explicit_matchning

  Sample SAM Template for lambda_explicit_matchning

# More info about Globals: https://github.com/awslabs/serverless-application-model/blob/master/docs/globals.rst
Globals:
  Function:
    Timeout: 900

Resources:
  ExplicitAlgoFunction:
    Type: AWS::Serverless::Function # More info about Function Resource: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#awsserverlessfunction

    Properties:
      MemorySize: 3008

As you can see in my example, we have quotes for string and no quotes for integer.

Then to load and parse that yaml file, it's as simple as that (no need to worry about style)

    from ruamel.yaml import YAML
    yaml = YAML()
    file = open("template.yaml", 'r')
    sam_yaml = file.read()
    sam_yaml = yaml.load(sam_yaml)

The ruamel library can read the yaml file without worrying about the style. It's as simple as that :D

Upvotes: 0

Anthon
Anthon

Reputation: 76682

You can at least preserve the original flow/block style for the various elements with the normal yaml.dump() for some value of "normal".

What you need is a loader that saves the flow/bcock style information while reading the data, subclass the normal types that have the style (mappings/dicts resp. sequences/lists) so that they behave like the python constructs normally returned by the loader, but have the style information attached. Then on the way out using yaml.dump you provide a custom dumper that takes this style information into account.

I use the normal yaml.dump in my enhanced version of PyYAML called ruamel.yaml, but have special loader and dumper class RoundTripDumper (and a RoundTripLoader for yaml.load) that preserve the flow/block style (and any comments you might have in the file:

import ruamel.yaml as yaml

infile = yaml.load(open('yamlfile'), Loader=yaml.RoundTripLoader)

for key, value in infile['main'].items():
    if key == 'keepalivetimeout':
        item = value['item']
        item['keepalivetimeout'] = 400

print yaml.dump(infile, Dumper=yaml.RoundTripDumper)

gives you:

main:
  directory:
    options:
      directive: options
      item:
        options: Stuff OtherStuff MoreStuff
  directoryindex:
    item:
      directoryindex: stuff.htm otherstuff.htm morestuff.html
  fileetag:
    item:
      fileetag: Stuff
  keepalive:
    item:
      keepalive: Stuff
  keepalivetimeout:
    item:
      keepalivetimeout: 400

If you cannot install ruamel.yaml you can pull out the code from my repository and include it in your code, AFAIK PyYAML has not been upgraded since I started working on this.

I currently don't preserve the superfluous quote on the scalars, but I do preserve the chomping information (for multiline statements starting with '|'. That information is thrown out really early on in the input processing of the YAML file and would require multiple changes to be preserved.

Since you seem to be having different quotes for key and value string scalars, you can achieve the output you want by overriding process_scalar (part of the Emitter in emitter.py) to add the quotes based on the string scalar being a key or not and being an integer or not:

import ruamel.yaml as yaml

# the scalar emitter from emitter.py
def process_scalar(self):
    if self.analysis is None:
        self.analysis = self.analyze_scalar(self.event.value)
    if self.style is None:
        self.style = self.choose_scalar_style()
    split = (not self.simple_key_context)
    # VVVVVVVVVVVVVVVVVVVV added
    try:
        x = int(self.event.value)  # might need to expand this
    except:
        # we have string
        if split:
            self.style = "'"
        else:
            self.style = '"'
    # ^^^^^^^^^^^^^^^^^^^^
    # if self.analysis.multiline and split    \
    #         and (not self.style or self.style in '\'\"'):
    #     self.write_indent()
    if self.style == '"':
        self.write_double_quoted(self.analysis.scalar, split)
    elif self.style == '\'':
        self.write_single_quoted(self.analysis.scalar, split)
    elif self.style == '>':
        self.write_folded(self.analysis.scalar)
    elif self.style == '|':
        self.write_literal(self.analysis.scalar)
    else:
        self.write_plain(self.analysis.scalar, split)
    self.analysis = None
    self.style = None
    if self.event.comment:
        self.write_post_comment(self.event)


infile = yaml.load(open('yamlfile'), Loader=yaml.RoundTripLoader)

for key, value in infile['main'].items():
    if key == 'keepalivetimeout':
        item = value['item']
        item['keepalivetimeout'] = 400

dd = yaml.RoundTripDumper
dd.process_scalar = process_scalar

print '---'
print yaml.dump(infile, Dumper=dd)

gives you:

---
"main":
  "directory":
    "options":
      "directive": 'options'
      "item":
        "options": 'Stuff OtherStuff MoreStuff'
  "directoryindex":
    "item":
      "directoryindex": 'stuff.htm otherstuff.htm morestuff.html'
  "fileetag":
    "item":
      "fileetag": 'Stuff'
  "keepalive":
    "item":
      "keepalive": 'Stuff'
  "keepalivetimeout":
    "item":
      "keepalivetimeout": 400

which is quite close to what you asked for.

Upvotes: 2

Related Questions