Reputation: 5409
Is there a yaml library in python that can read an input file an entry at a time, as needed, rather than parsing the whole file? I have a long file with list as the root node. If I'm trying to find the first element satisfying a certain property, I may not need to read and parse the whole file, and get the result faster.
Upvotes: 1
Views: 1506
Reputation: 39788
You can use PyYAML's low-level parse()
API:
import yaml
for event in yaml.parse(input):
# process event
The events are documented here.
If you want to construct each item of a root-level sequence into a native Python value, you need to use the Composer
and Constructor
classes. Composer
reads events and transforms them into nodes, Constructor
builds Python values from nodes. This corresponds to the loading process defined in the YAML spec:
(source: yaml.org)
Now PyYAML's Composer
expects functions get_event
, check_event
and peek_event
to exist on self
, but doesn't implement them. They are implemented by Parser
. Therefore, to have a working YAML loading chain, PyYAML later does:
class Loader(Reader, Scanner, Parser, Composer, Constructor, Resolver):
def __init__(self, stream):
Reader.__init__(self, stream)
Scanner.__init__(self)
Parser.__init__(self)
Composer.__init__(self)
Constructor.__init__(self)
Resolver.__init__(self)
For you, this means that you need a Loader
object and use the Parser
API for top-level events, along with the Composer
and Constructor
API to load each item in the top-level sequence.
Here's some code that gets you started:
import yaml
input = """
- "A": 1
- "B": 2
- foo
- 1
"""
loader = yaml.SafeLoader(input)
# check proper stream start (should never fail)
assert loader.check_event(yaml.StreamStartEvent)
loader.get_event()
assert loader.check_event(yaml.DocumentStartEvent)
loader.get_event()
# assume the root element is a sequence
assert loader.check_event(yaml.SequenceStartEvent)
loader.get_event()
# now while the next event does not end the sequence, process each item
while not loader.check_event(yaml.SequenceEndEvent):
# compose current item to a node as if it was the root node
node = loader.compose_node(None, None)
# construct a native Python value with the node.
# we set deep=True for complete processing of all the node's children
value = loader.construct_object(node, True)
print(value)
# assume document ends and no further documents are in stream
loader.get_event()
assert loader.check_event(yaml.DocumentEndEvent)
loader.get_event()
assert loader.check_event(yaml.StreamEndEvent)
Be aware that you might run into problems if you have anchors & aliases in the YAML document.
Upvotes: 2