user3134725
user3134725

Reputation: 1033

Why is parsing of aaaaabb not successful assuming the given grammar?


Suppose grammar G has the following productions …

  1. S → aSb | A | B
  2. A → λ | a | aa | aaa
  3. B → bB | b

Using grammar G, a derivation of the string aaaaabb is …

S ⇒ aSb ⇒ aaSbb ⇒ aaAbb ⇒ aaaaabb


In the Raku grammar shown below, why is parsing of the string aaaaabb not successful? Is the derivation shown above not applicable?


# Goal: language generated by grammar = { a^n b^m : n <= m + 3 }
# Example string is:
#   aaaaabb = a^5 b^2, where 5 <= 2 + 3
# Derivation of string aaaaabb is:
#   S => aSb => aaSbb => aaAbb => aaaaabb

grammar MyContextFreeGrammar
    {
    token TOP { <S> }
    token S   { 'a' <S> 'b' | <A> | <B> }
    token A   { '' | 'a' | 'a' 'a' | 'a' 'a' 'a' }
    token B   { 'b' <B> | 'b' }
    } # end grammar

my $abString = 'aaaaabb';

my $myParseResult = MyContextFreeGrammar.parse( $abString ) ;

if $myParseResult.Bool
  {
  printf( "\nParsing of \'%s\' was *SUCCESSFUL*.\n", $abString ) ;
  }
else
  {
  printf( "\nParsing of \'%s\' was *NOT* successful.\n", $abString ) ;
  }

Program output is …

Parsing of 'aaaaabb' was *NOT* successful.

Update

Raku Documentation (https://docs.raku.org/language/grammars) indicates that …

(The declarators token and rule) are both ratcheting, which means that the match engine won't back up and try again if it fails to match something.

The revised grammar shown below — which utilizes regex rather than token — allows for backtracking and behaves as desired for the two test strings aaaaabb (which is in the language generated by grammar G) and aaaaaabb (which is not in the language generated by grammar G)

grammar MyContextFreeGrammar
    {
    regex TOP { <S> }
    regex S   { 'a' <S> 'b' | <A> | <B> }
    regex A   { '' | 'a' | 'a' 'a' | 'a' 'a' 'a' }
    regex B   { 'b' <B> | 'b' }
    } # end grammar

my $abString = 'aaaaabb';
my $myParseResult = MyContextFreeGrammar.parse( $abString ) ;
printf( "\nParsing of \'%s\' was "
        ~ ($myParseResult ?? "*SUCCESSFUL*.\n" !! "*NOT* successful.\n" ),
        $abString ) ;

$abString = 'aaaaaabb';
$myParseResult = MyContextFreeGrammar.parse( $abString ) ;
printf( "\nParsing of \'%s\' was "
        ~ ($myParseResult ?? "*SUCCESSFUL*.\n" !! "*NOT* successful.\n" ),
        $abString ) ;

Program output is …

Parsing of 'aaaaabb' was *SUCCESSFUL*.

Parsing of 'aaaaaabb' was *NOT* successful.

Upvotes: 7

Views: 176

Answers (1)

raiph
raiph

Reputation: 32489

Raku grammars are parsers, not generative grammars

A natural idiomatic solution to the problem example:

grammar S { token TOP { (a*) (b*) <?{ $0.chars <= $1.chars + 3 }> } }

This directly encodes your example's description in a parser -- the token¹ does a match/capture of zero or more as, then a match/capture of zero or more bs, and finally a predicate check corresponding to your example's formula.


Test code and results:

say "$_: {so S.parse: $_}" for ^9 .map: { 'a' x $_ ~ 'bb' }

bb: False
abb: True
aabb: True
aaabb: True
aaaabb: True
aaaaabb: True
aaaaaabb: False
aaaaaaabb: False
aaaaaaaabb: False

One can use generative grammars with Raku just as much as one can use them with any other PL.

But the vision and mission for Raku includes first class grammars (and grammar fragments) that can be composed.

If you combine two or more generative CFGs the result may be ambiguous even if the constituent grammars are unambiguous. There is no algorithm that can detect if the result is ambiguous so there can't even be a warning or error if a composition yields this result.

So generative grammars could not be used for its built in grammar construct.


To be clear, grammar composition isn't just a very-nice-to-have but something that's increasingly essential in an increasingly polyglot world. Folk want to be able to do things like embed SQL in GPL code. Now what? If the SQL is in a string you're subject to injection attacks. Larry Wall decided there had to be a better way, a principled way: composable grammars.

cf my comments in the reddit thread Ergonomic inline SQL as a Python library. Two key excerpts:

(One key thing I forgot to mention and that might not be obvious is that Raku slangs are fully integrated at compile time. So there can be compile-time syntax, type, etc. errors, and full defense against injection attacks, which is a lot harder if the DSL (eg SQL) is in strings.)

Wow, I had no idea that such a language "braid" idea existed. Thanks for sharing that little bit of Raku. The "sql insert into ...; with" line is especially cool, since you literally are writing multiple languages in each other with very little syntax noise.

Footnotes

¹ The token declarator implies ratcheting:

Ratcheting can be an optimization, because backtracking is costly. But more importantly, it closely corresponds to how humans parse a text.

The doc fails to mention another important aspect: elimination of (the risk of) catastrophic backtracking.

Upvotes: 11

Related Questions