Reputation: 17487
The following snippet attempts to parse a YAML document representing three users of an app store:
try {
jsyaml.load(y.textContent);
} catch (e) {
console.error(e.message);
}
<script src="https://unpkg.com/[email protected]/dist/js-yaml.min.js"></script>
<script id="y" type="text/yaml">
jeff: &jeff
FullName: Jeff Miller
ParentalConsentRequired: false
joe:
FullName: Joe Miller
ParentalConsentRequired: true
Parents:
- *jeff
- *mary
mary: &mary
FullName: Mary Miller
ParentalConsentRequired: false
</script>
but this is invalid YAML because the alias *mary
comes before the anchor &mary
, whereas it must be subsequent to it.
This simple example could be repaired by sorting the mapping nodes differently and putting mary:
before joe:
. But more complex examples1 may not have such an obvious topological sort and may end up looking like the output of this snippet:
const obj = {
jeff: { FullName: "Jeff Miller", ParentalConsentRequired: false },
joe: { FullName: "Joe Miller", ParentalConsentRequired: true },
mary: { FullName: "Mary Miller", ParentalConsentRequired: false }
};
obj.joe.Parents = [obj.jeff, obj.mary];
y.textContent = jsyaml.dump(obj);
<script src="https://unpkg.com/[email protected]/dist/js-yaml.min.js"></script>
<textarea rows="15" cols="80" id="y"></textarea>
Such nested anchors lead to objects of the same class at different indentation levels. I think that legibility of the YAML document suffers from this.
Do YAML tools address this legibility issue? (I'm not seeking recommendations, just an answer from someone with overview of the technology.)
I can imagine two main approaches:
Do these approaches exist in the wild?
Addendum: After reading David Maze's answer, I would restrict myself to YAML documents without duplicate anchors.
1 Like this one which made me think about this topic.
Upvotes: 1
Views: 41
Reputation: 159750
It's certainly possible to write a YAML-ish parser that doesn't strictly conform to YAML's rules, but then I wouldn't call it "YAML".
One interesting bit of commentary is YAML 1.2.2 §3.2.2.2, "Anchors and Aliases", which states (emphasis mine)
When composing a representation graph from serialized events, an alias event refers to the most recent event in the serialization having the specified anchor. Therefore, anchors need not be unique within a serialization.
For your second point you could come up with some rule if an anchor hasn't appeared yet in a document; choose the first appearance or the last one or something else. You'd have to have a configuration option to allow best-effort parsing of otherwise-invalid documents, though.
Trying to reorder a document can be tricky, though. Consider this (invalid) document:
one: &a
x: &x 1
top:
one: *a
two: *b # <-- invalid reference
x: *x
two: &b
x: &x 2
If you remove top: { two: }
then this is a valid file that produces top: { x: 1 }
by the most-recent-appearance rule. But if you moved the top-level two:
above top:
to make *b
become valid, you've now changed which is the most-recent &x
.
My current experience is heavily oriented towards Docker and Kubernetes, which do make significant use of YAML in general. Kubernetes manifests almost never use YAML anchors, even in a couple of places where they might make sense (sharing labels:
between a Deployment spec and its embedded Pod template, for example). For Docker Compose YAML files, examples I've seen on Stack Overflow that use YAML anchors are trying to mitigate just having too many settings.
When I've written code that produces YAML, I never try to produce anchors; I'm fine with some content getting duplicated if that's what happens. (And as often as not, if I need to produce YAML, my language's built-in JSON support will write out a valid YAML file, and JSON's syntax does not include anchors.)
YAML definitely has some complexities for the causal observer; people frequently get confused about lists vs. maps vs. maps as list items, for example. If you can, I'd avoid making it even more complex by introducing anchors into it, and I wouldn't worry about the legibility or ordering issues you discuss in the question.
Upvotes: 1