Reputation: 25
I'm trying to understand the claim on https://pyyaml.org/wiki/PyYAML that:
PyYAML features
- a complete YAML 1.1 parser. In particular, PyYAML can parse all
examples from the specification.
If you go to the online YAML parser that uses PyYAML (http://yaml-online-parser.appspot.com/), then several of the examples taken from the specification do not work.
I understand that you would need to have tags defined for some of these failures, and that the online parser can only handle single document YAML, I know how to "fix" that when I use PyYAML.
But example 11 fails as well and it has no special tags and is a single document. How can PyYAML claim it can parse all examples, where it obviously doesnt? Is this because PyYAML is for YAML 1.1 and the examples are from the YAML 1.2 specification?
Upvotes: 2
Views: 414
Reputation: 76578
To start with your last question: this is not because of the examples coming from a later specification. Assuming you restrict yourself to the Preview chapter/section in the spec (as does the online parser), and taking into account that I have only compared the examples visually (i.e. not on a character for character basis), the examples in the 1.2 and 1.1 specification for chapter/section 2 are the same.
Your misinterpretation comes from the use of the word parser in the title of the online parser. What it actually tries to do, is load the YAML and then dump to JSON, Python or canonical YAML. The loading in PyYAML consists of the stages mentioned in the Processing Overview picture in the YAML spec (same for 1.1 and 1.2), starting with the character based document: a parsing, composing and construction step.
PyYAML doesn't fail on the parsing step, it fails on the construction
step, because (as @torek indicates) PyYAML constructs a list
and
that cannot be used as a key for a Python dict
. This is a
restriction of Python's dict
implementation and IMO one of the deficiencies of PyYAML.
import sys
import yaml as pyyaml
yaml_1_1_example_2_11 = """\
? - Detroit Tigers
- Chicago cubs
:
- 2001-07-23
? [ New York Yankees,
Atlanta Braves ]
: [ 2001-07-02, 2001-08-12,
2001-08-14 ]
"""
for event in pyyaml.parse(yaml_1_1_example_2_11):
print(event)
gives:
StreamStartEvent()
DocumentStartEvent()
MappingStartEvent(anchor=None, tag=None, implicit=True)
SequenceStartEvent(anchor=None, tag=None, implicit=True)
ScalarEvent(anchor=None, tag=None, implicit=(True, False), value='Detroit Tigers')
ScalarEvent(anchor=None, tag=None, implicit=(True, False), value='Chicago cubs')
SequenceEndEvent()
SequenceStartEvent(anchor=None, tag=None, implicit=True)
ScalarEvent(anchor=None, tag=None, implicit=(True, False), value='2001-07-23')
SequenceEndEvent()
SequenceStartEvent(anchor=None, tag=None, implicit=True)
ScalarEvent(anchor=None, tag=None, implicit=(True, False), value='New York Yankees')
ScalarEvent(anchor=None, tag=None, implicit=(True, False), value='Atlanta Braves')
SequenceEndEvent()
SequenceStartEvent(anchor=None, tag=None, implicit=True)
ScalarEvent(anchor=None, tag=None, implicit=(True, False), value='2001-07-02')
ScalarEvent(anchor=None, tag=None, implicit=(True, False), value='2001-08-12')
ScalarEvent(anchor=None, tag=None, implicit=(True, False), value='2001-08-14')
SequenceEndEvent()
MappingEndEvent()
DocumentEndEvent()
StreamEndEvent()
So PyYAML can parse this correctly. Not only that, if the online "parser" would not try to load, then dump, when emitting canonical YAML, it could process this example (replacing the last two lines of the above code):
pyyaml.emit(pyyaml.parse(yaml_1_1_example_2_11), stream=sys.stdout, canonical=True)
as this gives:
---
{
? [
! "Detroit Tigers",
! "Chicago cubs",
]
: [
! "2001-07-23",
],
? [
! "New York Yankees",
! "Atlanta Braves",
]
: [
! "2001-07-02",
! "2001-08-12",
! "2001-08-14",
],
}
Stating that PyYAML parses all examples, is like me stating that I can read Greek. I learned the Greek alphabet back in the 70's, so I can read (the) Greek (characters), but I don't understand the words they form.
In ruamel.yaml
(disclaimer: I am the author of that package) you can load this example, and you can even use PyYAML to
dump the loaded data.
from pprint import pprint
import ruamel.yaml
import yaml as pyyaml
yaml = ruamel.yaml.YAML(typ='safe')
data = yaml.load(yaml_1_1_example_2_11)
pprint(data)
print('*' * 50)
yaml.dump(data, sys.stdout)
print('*' * 50)
pyyaml.safe_dump(data, sys.stdout)
as that gives:
{('Detroit Tigers', 'Chicago cubs'): [datetime.date(2001, 7, 23)],
('New York Yankees', 'Atlanta Braves'): [datetime.date(2001, 7, 2),
datetime.date(2001, 8, 12),
datetime.date(2001, 8, 14)]}
**************************************************
? [Detroit Tigers, Chicago cubs]
: [2001-07-23]
? [New York Yankees, Atlanta Braves]
: [2001-07-02, 2001-08-12, 2001-08-14]
**************************************************
? [Detroit Tigers, Chicago cubs]
: [2001-07-23]
? [New York Yankees, Atlanta Braves]
: [2001-07-02, 2001-08-12, 2001-08-14]
Upvotes: 4