Eleni
Eleni

Reputation: 645

Overwrite variable value in yaml mapping

I have defined a mapping in yaml that looks like:

default: &DEFAULT
    bucket: &bucket  default_path
    # Make sure that the second parameter of join doesn't start with a / 
    # otherwise it is interpreted as an absolute path and join won't work
    path1: !!python/object/apply:os.path.join [*bucket, work_area/test1]
    path2: !!python/object/apply:os.path.join [*bucket, work_area/test2]

I need to define more keys where the only value to be overwritten is bucket, sth like:

production:
    <<: *DEFAULT
    bucket: "s3://production-bucket"

but I still get
conf['production']['path1'] => 'default_path/work_area/test1'
instead of
conf['production']['path1'] => 's3://production-bucket/work_area/test1'.
Is there any way to do this in yaml?

As obvious from the syntax, I use pyyaml to parse the file.

Upvotes: 1

Views: 2629

Answers (1)

Anthon
Anthon

Reputation: 76872

YAML interpreters should take the most recent definition of an anchor:

An alias node is denoted by the “*” indicator. The alias refers to the most recent preceding node having the same anchor. It is an error for an alias node to use an anchor that does not previously occur in the document. It is not an error to specify an anchor that is not used by any alias node.

So even if PyYAML (3.10/3.11) would not throw a ComposerError if you try to parse:

default: &DEFAULT
    bucket: &bucket  default_path
    # Make sure that the second parameter of join doesn't start with a / 
    # otherwise it is interpreted as an absolute path and join won't work
    path1: !!python/object/apply:os.path.join [*bucket, work_area/test1]
    path2: !!python/object/apply:os.path.join [*bucket, work_area/test2]
production:
    <<: *DEFAULT
    bucket: &bucket "s3://production-bucket"

inserting the path1 and path2 keys with <<: *DEFAULT* would give you their expanded versions with default_path as that is the definition available to the parser when reading [*bucket, work_area/test1]

The "expansion" of the alias is done as soon as the alias is read in from the YAML source, not at some point at the end of the file, when all anchored data has been read in.


In you updated example, there is no other anchor bucket defined than the one for the scalar "default_path". You are confusing yourself by using the same name for the anchor and the keys (bucket), but the key names are completely irrelevant for resolving the alias *bucket.

If you can rearrange your YAML you might get something acceptable to your use case by doing ¹:

import ruamel.yaml

yaml_str = """\
default: &DEFAULT
    bucket: &klm default_path
production:
    &klm "s3://production-bucket"
result:
    <<: *DEFAULT
    # Make sure that the second parameter of join doesn't start with a /
    # otherwise it is interpreted as an absolute path and join won't work
    path1: !!python/object/apply:os.path.join [*klm, work_area/test1]
    path2: !!python/object/apply:os.path.join [*klm, work_area/test2]
"""

conf = ruamel.yaml.load(yaml_str)
print(conf['result']['path1'])

which will give you:

s3://production-bucket/work_area/test1

¹ This was done using ruamel.yaml of which I am the author.

Upvotes: 1

Related Questions