Nicholas Tripp
Nicholas Tripp

Reputation: 71

Python Yaml dump remove Extra junk

I created a dictionary of sets:

db = defaultdict(lambda: defaultdict(set))

iterated through a db and added what I needed from the rows

db['greenhouse1']['fruits'].append('apples')  
db['greenhouse1']['fruits'].append('oranges')
db['greenhouse1']['colors'] = ["red", "orange"]

db['greenhouse2']['fruits'].append('banana')

the yaml.dump(db) adds a bunch of crap I don't want:

greenhouse1: !!python/object/apply:collections.defaultdict
  args:
    - *d001
  dictitems:
    fruits:
    - oranges
    - apples
    colors:
    - orange
    - red

args I don't want and dictitems I don't want just the depth below that

Upvotes: 1

Views: 1519

Answers (2)

Anthon
Anthon

Reputation: 76912

There is all kind of weird things going on. E.g. you cannot append to a set as you claim from your code. Are you sure you didn't specify list as argument to the nested defaultdict?

In any case your "junk" is caused by PyYAML's way of dumping complex objects, instead of normal dicts.

What I recommend is using ruamel.yaml instead as it handles YAML 1.2 (which replaced YAML 1.1, which is what PyYAML partly supports, back in 2009), its dump by default handles utf-8 and can work with Path instances in addition to opened files.

Just make a representer for defaultdict that does away with the defaultdict-ness:

import sys
from collections import defaultdict
from pathlib import Path
import ruamel.yaml

outfile = Path('db.yaml')

db = defaultdict(lambda: defaultdict(list))

db['greenhouse1']['fruits'].append('apples')  
db['greenhouse1']['fruits'].append('oranges')
db['greenhouse1']['colors'] = ["red", "orange"]

db['greenhouse2']['fruits'].append('banana')


def default_dict_to_yaml(representer, data):
    return representer.represent_dict(dict(data.items()))

yaml = ruamel.yaml.YAML()
# yaml.indent(mapping=4, sequence=4, offset=2)
yaml.Representer.add_representer(defaultdict, default_dict_to_yaml)
yaml.dump(db, outfile)

print(outfile.read_text())

Which shows your db.yaml contains:

greenhouse1:
  fruits:
  - apples
  - oranges
  colors:
  - red
  - orange
greenhouse2:
  fruits:
  - banana

Without first having to write to a JSON file.

Of course this (and your solution) doesn't load back to a defaultdict. If you want that instead you should look at this answer, but it will get you some "junk" so Python knows what to default to in the loaded defaultdict.

Upvotes: 1

Nicholas Tripp
Nicholas Tripp

Reputation: 71

Stumbled on a weird solution. Basically dump to json, load json and update the object then dump yaml

if path.exists("db.json") == True:
    with open("db.json", 'r') as j:
        old_db = json.load(j)
        db.update(old_db)

with open("db.json", 'w') as outfile:
    outfile.write(json.dumps(db))

if path.exists("db.json") == True:
    with open("db.json", 'r') as j:
        old_db = json.load(j)
        db.update(old_db)
with open("db.yaml", 'w') as outfile:
    outfile.write(yaml.dump(db, default_flow_style=False))

I do not understand why dumping and loading would make this work, but it works.

Upvotes: 0

Related Questions