Why is parsing of aaaaabb not successful assuming the given grammar?

Question

Suppose grammar G has the following productions …

S → aSb | A | B
A → λ | a | aa | aaa
B → bB | b

Using grammar G, a derivation of the string aaaaabb is …

S ⇒ aSb ⇒ aaSbb ⇒ aaAbb ⇒ aaaaabb

In the Raku grammar shown below, why is parsing of the string aaaaabb not successful? Is the derivation shown above not applicable?

# Goal: language generated by grammar = { a^n b^m : n <= m + 3 }
# Example string is:
#   aaaaabb = a^5 b^2, where 5 <= 2 + 3
# Derivation of string aaaaabb is:
#   S => aSb => aaSbb => aaAbb => aaaaabb

grammar MyContextFreeGrammar
    {
    token TOP {  }
    token S   { 'a'  'b' |  |  }
    token A   { '' | 'a' | 'a' 'a' | 'a' 'a' 'a' }
    token B   { 'b'  | 'b' }
    } # end grammar

my $abString = 'aaaaabb';

my $myParseResult = MyContextFreeGrammar.parse( $abString ) ;

if $myParseResult.Bool
  {
  printf( "
Parsing of \'%s\' was *SUCCESSFUL*.
", $abString ) ;
  }
else
  {
  printf( "
Parsing of \'%s\' was *NOT* successful.
", $abString ) ;
  }

Program output is …

Parsing of 'aaaaabb' was *NOT* successful.

Update

Raku Documentation (https://docs.raku.org/language/grammars) indicates that …

(The declarators token and rule) are both ratcheting, which means that the match engine won't back up and try again if it fails to match something.

The revised grammar shown below — which utilizes regex rather than token — allows for backtracking and behaves as desired for the two test strings aaaaabb (which is in the language generated by grammar G) and aaaaaabb (which is not in the language generated by grammar G) …
grammar MyContextFreeGrammar { regex TOP { } regex S { 'a' 'b' || } regex A { '' | 'a' | 'a' 'a' | 'a' 'a' 'a' } regex B { 'b' | 'b' } } # end grammar my $abString = 'aaaaabb'; my $myParseResult = MyContextFreeGrammar.parse( $abString ) ; printf( " Parsing of \'%s\' was " ~ ($myParseResult ?? "*SUCCESSFUL*. " !! "*NOT* successful. " ), $abString ) ; $abString = 'aaaaaabb'; $myParseResult = MyContextFreeGrammar.parse( $abString ) ; printf( " Parsing of \'%s\' was " ~ ($myParseResult ?? "*SUCCESSFUL*. " !! "*NOT* successful. " ), $abString ) ;

Program output is …

Parsing of 'aaaaabb' was *SUCCESSFUL*. Parsing of 'aaaaaabb' was *NOT* successful.

raiph · Accepted Answer

Raku grammars are parsers, not generative grammars

A natural idiomatic solution to the problem example:

grammar S { token TOP { (a*) (b*)  } }

This directly encodes your example's description in a parser -- the token¹ does a match/capture of zero or more as, then a match/capture of zero or more bs, and finally a predicate check corresponding to your example's formula.

Test code and results:

say "$_: {so S.parse: $_}" for ^9 .map: { 'a' x $_ ~ 'bb' }

bb: False
abb: True
aabb: True
aaabb: True
aaaabb: True
aaaaabb: True
aaaaaabb: False
aaaaaaabb: False
aaaaaaaabb: False

One can use generative grammars with Raku just as much as one can use them with any other PL.

But the vision and mission for Raku includes first class grammars (and grammar fragments) that can be composed.

If you combine two or more generative CFGs the result may be ambiguous even if the constituent grammars are unambiguous. There is no algorithm that can detect if the result is ambiguous so there can't even be a warning or error if a composition yields this result.

So generative grammars could not be used for its built in grammar construct.

To be clear, grammar composition isn't just a very-nice-to-have but something that's increasingly essential in an increasingly polyglot world. Folk want to be able to do things like embed SQL in GPL code. Now what? If the SQL is in a string you're subject to injection attacks. Larry Wall decided there had to be a better way, a principled way: composable grammars.

cf my comments in the reddit thread Ergonomic inline SQL as a Python library. Two key excerpts:

(One key thing I forgot to mention and that might not be obvious is that Raku slangs are fully integrated at compile time. So there can be compile-time syntax, type, etc. errors, and full defense against injection attacks, which is a lot harder if the DSL (eg SQL) is in strings.)

Wow, I had no idea that such a language "braid" idea existed. Thanks for sharing that little bit of Raku. The "sql insert into ...; with" line is especially cool, since you literally are writing multiple languages in each other with very little syntax noise.

Footnotes

¹ The token declarator implies ratcheting:

Ratcheting can be an optimization, because backtracking is costly. But more importantly, it closely corresponds to how humans parse a text.

The doc fails to mention another important aspect: elimination of (the risk of) catastrophic backtracking.

Why is parsing of aaaaabb not successful assuming the given grammar?

Update

Answers (1)

Raku grammars are parsers, not generative grammars

Footnotes

Related Questions