Shawn Taylor
Shawn Taylor

Reputation: 440

Parsing yaml file and getting a dictionary

I'd like to be able to take the YAML defined below and turn it into a dictionary.

development:
    user:dev_uid
    pass:dev_pwd
    host:127.0.0.1
    database:dev_db

production:
    user:uid
    pass:pwd
    host:127.0.0.2
    database:db

I have been able to use the YAML library to load the data in. However, my dictionary appears to contain the environmental items as a long string.

This code:

#!/usr/bin/python3

import yaml

config  = yaml.load(open('database.conf', 'r'))

print(config['development'])

yields the following output.

user:dev_uid pass:dev_pwd host:127.0.0.1 database:dev_db

I can't access any of the entries by key name or load that string subsequent using the yaml.load method.

print(config['development']['user'])

This code yields the following error:

TypeError: string indices must be integers

Ideally I would like to end up with a parsing function that returns a dictionary or a list so I can access the properties by key name or using the dot operator like:

print(config['development']['user'])
config.user

Where am I going wrong?

Upvotes: 6

Views: 21790

Answers (3)

wim
wim

Reputation: 362478

Your "yaml" is not a mapping of mappings, it's a mapping of strings. In YAML 1.2, block mapping entries need whitespace after the separator, e.g.

development:
    user: dev_uid
    pass: dev_pwd
    host: 127.0.0.1
    database: dev_db

production:
    user: uid
    pass: pwd
    host: 127.0.0.2
    database: db

Don't try to pre-process this text. Instead, find who generated the markup and throw the spec at them.

Upvotes: 10

Anthon
Anthon

Reputation: 76578

Your YAML is absolutely valid, and that is why you won't get an error when loading this. That it doesn't load as you expect is because YAML has a feature to wrap (long) lines at whitespace and this works for unquoted scalars such as your

user:dev_uid
pass:dev_pwd
host:127.0.0.1
database:dev_db

Your YAML file is equivalent to:

development: "user:dev_uid pass:dev_pwd host:127.0.0.1 database:dev_db" production: "user:uid pass:pwd host:127.0.0.2 database:db"

and to

development: user:dev_uid pass:dev_pwd host:127.0.0.1 database:dev_db production: user:uid pass:pwd host:127.0.0.2 database:db

as quotes are not necessary, since there can be no confusion about the value for development to be a mapping, as for that the colon after the key should be followed by a space. This can be seen from the older (now outdated) YAML 1.1 specification that was used to implement PyYAML¹.

Best is to convert, correct the YAML which can be easily done if you can assume that none of the keys and values have embedded spaces:

import sys
import yaml


yaml_str = """\
development:
    user:dev_uid
    pass:dev_pwd
    host:127.0.0.1
    database:dev_db

production:
    user:uid
    pass:pwd
    host:127.0.0.2
    database:db
"""

data = yaml.safe_load(yaml_str)
for key in data:
    val = data[key]
    if ':' not in val:
        continue
    data[key] = tmp = {}
    for x in val.split():
        x = x.split(':', 1)
        tmp[x[0]] = x[1]

yaml.safe_dump(data, sys.stdout, default_flow_style=False)

If your file is more complicated that what you presented, you might have to recurs into dict values and list items, which is fairly trivial.

The above outputs:

development:
  database: dev_db
  host: 127.0.0.1
  pass: dev_pwd
  user: dev_uid
production:
  database: db
  host: 127.0.0.2
  pass: pwd
  user: uid

which then loads as you expect without the hassle.


¹The newer YAML 1.2 allows key-value pairs without a space after the colon when using flow-style mappings. But the pre-requisite for that is that both key and value are (double) quoted. This change was necessary to allow YAML 1.2 compatibility with JSON:

development: {
    "user":"dev_uid",
    "pass":"dev_pwd",
    "host":"127.0.0.1",
    "database":"dev_db"
  }

Upvotes: -2

vasia
vasia

Reputation: 1172

Since you are not getting what you want with the yaml module immediately, your .conf file is probably using a format different than what the yaml module currently expects.

This code is a quick workaround that gives you the dictionary you want:

for mainkey in ['production','development']:
    d = {}
    for item in config[mainkey].split():
        key,value = item.split(':')
        d[key] = value
    config[mainkey] = d

Upvotes: 2

Related Questions