Reputation: 3010
I'm working on a PEG grammar that takes code in a Music Programming Language and creates a parse tree of musical events (notes, chords, volume/tempo changes, etc.). A feature of my MPL is that it supports voices, i.e. different sequences of events happening at the same time. I'm having a hard time getting my Instaparse grammar to parse this correctly... what I want is a voices
tag that consists of one or more voice
s, each of which consists of a voice definition (e.g. V1:
) and then any number of events. The voices
tag should end either with V0:
(which means the end of the split voices, and we're back to just one voice, or "voice zero"), or the end of the file.
Here is an excerpt of my grammar in progress (I'm leaving out the definitions of note
, chord
, etc. for the sake of clarity):
part = <ows> event+
<event> = chord | note | rest | octave-change |
attribute-change | voices |
marker | at-marker
voices = voice+
voice = !voices voice-number voice-events?
(<voice-zero> | #"\z")
voice-number = <"V"> #"[1-9]\d*" <":"> <ows>
<voice-zero> = <"V0:"> <ows>
voice-events = !voices event+
...
ows = #"\s*"
Given the following code:
V1: o2 b1/>b o2 g+/>g+ o2 g/>g
V0: e8 f+ g+ a b2
Running the parser gives the following output:
[:part
[:voices
[:voice [:voice-number "1"]
[:voice-events
[:octave-change "2"] [:chord [:note [:pitch "b"]
[:duration "1"]] [:octave-change ">"] [:note [:pitch "b"]]]
[:octave-change "2"] [:chord [:note [:pitch "g+"]]
[:octave-change ">"] [:note [:pitch "g+"]]]
[:octave-change "2"] [:chord [:note [:pitch "g"]]
[:octave-change ">"] [:note [:pitch "g"]]]]]]
[:note [:pitch "e"] [:duration "8"]]
[:note [:pitch "f+"]]
[:note [:pitch "g+"]]
[:note [:pitch "a"]]
[:note [:pitch "b"] [:duration "2"]]]
Which is exactly what I want. The V0:
signals the end of the voices
tag, and the last 5 notes are on their own inside the part
tag.
However, when I change the V0
to V2
, I get this:
[:part
[:voices
[:voice [:voice-number "1"]
[:voice-events
[:octave-change "2"] [:chord [:note [:pitch "b"] [:duration "1"]]
[:octave-change ">"] [:note [:pitch "b"]]] [:octave-change "2"]
[:chord [:note [:pitch "g+"]] [:octave-change ">"]
[:note [:pitch "g+"]]] [:octave-change "2"]
[:chord [:note [:pitch "g"]] [:octave-change ">"]
[:note [:pitch "g"]]]
[:voices
[:voice [:voice-number "2"]
[:voice-events
[:note [:pitch "e"] [:duration "8"]] [:note [:pitch "f+"]]
[:note [:pitch "g+"]] [:note [:pitch "a"]]
[:note [:pitch "b"] [:duration "2"]]]]]]]]]
For some reason, either the voice
1 tag or its voice-events
tag is not terminating like it's supposed to, and the second voice
is swallowed up as part of the first voice
's voice-events
. I also don't want there to be a second voices
tag; voice
2 should be within the primary voices
tag.
What I want is this:
[:part
[:voices
[:voice [:voice-number "1"]
[:voice-events
[:octave-change "2"] [:chord [:note [:pitch "b"] [:duration "1"]]
[:octave-change ">"] [:note [:pitch "b"]]] [:octave-change "2"]
[:chord [:note [:pitch "g+"]] [:octave-change ">"]
[:note [:pitch "g+"]]] [:octave-change "2"]
[:chord [:note [:pitch "g"]] [:octave-change ">"]
[:note [:pitch "g"]]]]]
[:voice [:voice-number "2"]
[:voice-events
[:note [:pitch "e"] [:duration "8"]] [:note [:pitch "f+"]]
[:note [:pitch "g+"]] [:note [:pitch "a"]]
[:note [:pitch "b"] [:duration "2"]]]]]]
I can't figure out what I'm doing wrong, but I think it has something to do with how I'm defining the voice
tag and/or the voice-events
tag. It may have something to do with how I'm using negative lookahead, which I don't think I fully understand yet. Can anyone figure out how I can fix my grammar?
Thanks! :)
Thanks, @DanielNeal! I've re-worked my grammar to this, which works exactly the way I want it to:
part = <ows> (voices | event)+
<event> = chord | note | rest | octave-change |
attribute-change | marker | at-marker
voices = voice+ (<voice-zero> | <#"\z">)
voice = voice-number event*
voice-number = <"V"> #"[1-9]\d*" <":"> <ows>
<voice-zero> = <"V0:"> <ows>
...
ows = #"\s*"
The big change was in how I defined part
and event
; before, I had these terms defined such that voices
is an event, so any subsequent voice
s were being consumed and lumped into the previous voice
's event
s. By pulling voices
out of the definition of an event
and redefining part
to be a variable number of voices
groupings or event
s, I eliminated the ambiguity and got the grammar to behave the way I want it to.
After that, the events
within a voice
were grouping properly, but I was still having a problem with each voice being within its own separate voices
tag, when I need them to all be within the same voices
grouping. I fixed this by specifying that a voices
tag ends either with a "V0:"
or the end of the file (\z
), in other words being more specific about how much code I want the voices
tag to consume.
The moral of the story is, if you're writing a PEG grammar and you're having problems, you probably need to make your definitions less ambiguous! I also ended up not using negative lookahead at all, which I think helped a lot to simplify / de-ambiguify my grammar.
Upvotes: 3
Views: 200
Reputation: 4173
I think you're right - it's the negative lookahead that's causing the problem. Without your full grammar, I can't test properly, but this line:
voice-events = !voices event+
Represents something that does not match voices
followed by one or more events
.
I'm assuming that voice-events
should not contain voices
within it in a recursive way, but at the moment it does - indirectly. Eachevent
can contain voices
within it, and in turn, voice-events
can contain events
.
In the sample above, the first event in V1 is an octave shift (which matches the not-a-voice condition). This allows the subsequent voice that occurs to get consumed within the event
definition. If that makes sense.
To fix this you could (perhaps) define it the other way round:
voice-event = chord | note | rest | octave-change | attribute-change | marker | at-marker
event = voice-event | voices
Upvotes: 2