Harindra Singh
Harindra Singh

Reputation: 371

Grammar Ambiguous or Not Ambiguous?

I'm new to BNF yet, I've got a tutorial question to solve. Given below is the question.

'For each of the following grammars specify whether they are ambiguous or unambiguous'.

Grammar1:

<T> ::= <T> <Q> 0 | 22
<Q> ::= 2|3 

Grammar2:

<first>::=<first><first><second>
<second>::=<third><second>
<third>::=a|b|c

Grammar3:

<A>::=<B><A><A>
<B>::=1|2|3|4

can somebody please help me to find the answers and describe in a way to easily understand that is a great help. so please.

Upvotes: 1

Views: 666

Answers (1)

Ira Baxter
Ira Baxter

Reputation: 95430

To detect an ambiguity in the grammar, you need to exhibit a string that can be parsed two ways.

Finding such a string can be hard with a large grammar; in fact, it can be impossibly hard.

But you do this by hand by exploring various token sequences. This gets dull fast, and doesn't work in practice if the grammar is anything but trivial.

What you really want to do is build a tool that enumerates possible strings of characters and tries them to see if there is an ambiguity.

You can do this brute force by simply generating all strings, but this rapidly produces many strings that are simply unparseable and that doesn't help.

Or, you can generate strings using the grammar as a guide, ensuring that each extension to a proposed string produces something the grammar will still accept. This way all generated strings are valid, so at least you are producing valid junk.

You can do this a depth-first search across the grammar rules. You end up mechanizing the following process:

 1.  Pick a pair of rules with the same LHS.
 2.  Instantiate  S1 with the RHS of the first rule, S2 with the RHS of the second.
 3.  Repeat until you are tired (hit some search depth):
     a. if s1 == s2, you've found an ambiguity.
     b. if s1 derives a terminal that s2 does not derive,
        then s1 and s2 cannot be ambiguous.
     c. Pick a nonterminal in s1 or s2.
        If there is none, then if s1 <> s2, this path doesn't lead to an ambiguity: backtrack.
     d. Replace the nonterminal with a valid RHS for that nonterminal.
     e. Recurse to a.
 4.  If all branches of the search lead to non-ambiguous strings,
     then this rule isn't ambiguous.

The DMS Software Reengineering Toolkit has a parser generator with this capability built in; we can simply try the grammars. I had to reformulate the grammars slightly to make them compatible with DMS, so I show the new version here:

Grammar1:

<T> ::= <T><Q> '0' ;
<T> ::= '2' '2' ;
<Q> ::= '2' ;
<Q> ::= '3' ;

DMS run on Grammar1:

C:\DMS\Domains\DMSStringGrammar\Tools\ParserGenerator\Source>run ..\ParserGenerator.P0B -interactive \temp\Grammar1.bnf
DMS GLR Parser Generator 2.3.5
Copyright (C) 1997-2017 Semantic Designs, Inc.
<<<Rule Collection Completed>>>
NTokens = 6 NRules = 4
*** LR(0) State Machine construction complete ***
States: 8
What next? ambiguities 10
Nonterminal <Q> is not ambiguous
*** Search for ambiguities to depth 1...
*** Search for ambiguities to depth 2...
*** Search for ambiguities to depth 3...
*** Search for ambiguities to depth 4...
 Nonterminal <T> is not ambiguous
*** All ambiguities in grammar detected ***

The tool reports that all nonterminals are not ambiguous. So, Grammar1 is not ambiguous.

Grammar2:

<first> = <first><first><second> ;
<second> = <third><second> ;
<third> = 'a' ;
<third> = 'b' ;
<third> = 'c' ;

DMS run on Grammar2:

C:\DMS\Domains\DMSStringGrammar\Tools\ParserGenerator\Source>run ..\ParserGenerator.P0B -interactive \temp\Grammar2.bnf
DMS GLR Parser Generator 2.3.5
Copyright (C) 1997-2017 Semantic Designs, Inc.
<<<Rule Collection Completed>>>
NTokens = 7 NRules = 5
*** LR(0) State Machine construction complete ***
Determining if machine is SLR(0), SLR(1) or LALR(1)...
States: 9
Detecting possible cycles...

*** Circular definition:
Rule 1: <first> = <first> <first> <second> ;

*** Circular definition:
Rule 2: <second> = <third> <second> ;

What next? ambiguities 10
Nonterminal <first> is circularly defined
Nonterminal <second> is circularly defined
Nonterminal <third> is not ambiguous
*** Search for ambiguities to depth 1...
*** All ambiguities in grammar detected ***

This grammar has a problem OP didn't ask about: the tokens <first> and <second> are ill-defined ("circularly defined" according to this tool). It should be clear that <first> expands starting with <first> but there is nothing provide to tell us what <first> can be expand to as a concrete literal. So the grammar isn't ambiguous... it is just outright broken.

Grammar3:

<A> = <B><A><A> ;
<B> = '1' ;
<B> = '2' ;
<B> = '3' ;
<B> = '4' ;

DMS run on Grammar3:

C:\DMS\Domains\DMSStringGrammar\Tools\ParserGenerator\Source>run ..\ParserGenerator.P0B -interactive \temp\Grammar3.bnf
DMS GLR Parser Generator 2.3.5
Copyright (C) 1997-2017 Semantic Designs, Inc.
<<<Rule Collection Completed>>>
NTokens = 7 NRules = 5
LR(0) State Machine Generation Phase.
*** LR(0) State Machine construction complete ***

States: 8

Detecting possible cycles...

*** Circular definition:
Rule 1: <A> = <B> <A> <A> ;

What next? ambiguities 10
Nonterminal <A> is circularly defined
Nonterminal <B> is not ambiguous
*** Search for ambiguities to depth 1...
*** All ambiguities in grammar detected ***

This grammar is also broken in a way OP didn't discuss. Here the problem is that we can find a substitution for <A>, but it leads to an infinite expansion. The grammar isn't ambiguous, but it only accepts infinitely long strings, which are not useful in practice.

Now, none of these grammars are ambiguous in the sense the OP actually wanted. Here I show a classic ambiguous grammar based on if-then-else statements with dangling else:

Grammar4:

G = S ;
S = 'if' E 'then' S ;
S = 'if' E 'then' S 'else' S ;
S = V '=' E ;

DMS Run on Grammar4:

C:\DMS\Domains\DMSStringGrammar\Tools\ParserGenerator\Source>run ..\ParserGenerator.P0B -interactive ..\Tests\ifthenelse_ambiguous.bnf
DMS GLR Parser Generator 2.3.5
Copyright (C) 1997-2017 Semantic Designs, Inc.
Opening ..\Tests\ifthenelse_ambiguous.bnf
<<<Rule Collection Completed>>>
NTokens = 9 NRules = 4
*** LR(0) State Machine construction complete ***

What next? ambiguities 10

Nonterminal G is not ambiguous
*** Search for ambiguities to depth 1...
*** Search for ambiguities to depth 2...

Ambiguous Rules:
S = 'if' E 'then' S 'else' S ; SemanticCopy2
S = 'if' E 'then' S ; SemanticCopy2
Instance: < 'if' E 'then' 'if' E 'then' S 'else' S >
Derivation:
 1: < S 'else' S >
    < S >
 2: < 'if' E 'then' S 'else' S >
    < 'if' E 'then' S 'else' S >
*** All ambiguities in grammar detected ***

The search finds an ambiguous instance phrase for statement. If you look at the instance phrase, you should see that there is one else clause... and the grammar allows it to attach itself to either if-then statement.

You don't need a tool like this for really tiny grammars; you can do it by looking at the rules and working it out. But for a large grammar, this is hard, and that's where a tool like this is really useful.

Consider a run on a Java version 8 grammar, with over 400 rules:

C:\DMS\Domains\DMSStringGrammar\Tools\ParserGenerator\Source>run ..\ParserGenerator.P0B -interactive "C:\DMS\Domains\Java\v8\tools\Parser\Source\Syntax\%Java~v8.bnf"
DMS GLR Parser Generator 2.3.5
Copyright (C) 1997-2017 Semantic Designs, Inc.
Opening C:\DMS\Domains\Java\v8\tools\Parser\Source\Syntax\%Java~v8.bnf
<<<Rule Collection Completed>>>
NTokens = 243 NRules = 410
*** LR(0) State Machine construction complete ***
States: 774

What next? ambiguities 15

Nonterminal optional_CONTROL_Z is not ambiguous
Nonterminal package_name_declaration is not ambiguous
Nonterminal anonymous_class_creation is not ambiguous
Nonterminal annotation_type_declaration is not ambiguous
Nonterminal annotation_interface_header is not ambiguous
Nonterminal default_value is not ambiguous
Nonterminal field_declaration is not ambiguous
Nonterminal member_value is not ambiguous
Nonterminal marker_annotation is not ambiguous
Nonterminal single_member_annotation is not ambiguous
Nonterminal enum_body is not ambiguous
Nonterminal type_parameters is not ambiguous
Nonterminal modifier is not ambiguous
Nonterminal local_variable_declaration is not ambiguous
Nonterminal vararg_parameter is not ambiguous
Nonterminal variable_declarator_id is not ambiguous
Nonterminal variable_initializer is not ambiguous
Nonterminal primitive_type is not ambiguous
Nonterminal try_resource_list_opt is not ambiguous
Nonterminal catch_statements_opt is not ambiguous
Nonterminal finally_statement_opt is not ambiguous
Nonterminal finally_statement is not ambiguous
Nonterminal switch_group is not ambiguous
Nonterminal switch_label is not ambiguous
Nonterminal catch_statement is not ambiguous
Nonterminal catch_parameter is not ambiguous
Nonterminal literal is not ambiguous
Nonterminal array_dims is not ambiguous
Nonterminal array_creation_with_initialization is not ambiguous
Nonterminal dim_spec is not ambiguous
Nonterminal superpath is not ambiguous
Nonterminal thispath is not ambiguous
Nonterminal target is not ambiguous
Nonterminal unary_expression is not ambiguous
Nonterminal lambda_body is not ambiguous
Nonterminal right_angle is not ambiguous
*** Search for ambiguities to depth 1...
 Nonterminal type_declarations is not ambiguous
 Nonterminal annotations_opt is not ambiguous
 Nonterminal modifiers is not ambiguous
 Nonterminal brackets is not ambiguous
 Nonterminal switch_groups is not ambiguous
 Nonterminal bounds_list is not ambiguous
*** Search for ambiguities to depth 2...
 Nonterminal class_body is not ambiguous
 Nonterminal arguments is not ambiguous
 Nonterminal annotation_type_body is not ambiguous
 Nonterminal qualified_name is not ambiguous
 Nonterminal interface_body is not ambiguous
 Nonterminal enum_class_header is not ambiguous
 Nonterminal enum_class_body_opt is not ambiguous
 Nonterminal block is not ambiguous

Ambiguous Rules:
executable_statement = 'if' '(' expression ')' executable_statement 'else' executable_statement ; SemanticCopy2
executable_statement = 'if' '(' expression ')' executable_statement ; SemanticCopy2
Instance: < 'if' '(' expression ')' 'if' '(' expression ')' executable_statement 'else' executable_statement >
Derivation:
 1: < executable_statement 'else' executable_statement >
< executable_statement >
 2: < 'if' '(' expression ')' executable_statement 'else' executable_statement >
< 'if' '(' expression ')' executable_statement 'else' executable_statement >
 Nonterminal variable_declarator is not ambiguous
 Nonterminal try_resource_list is not ambiguous
*** Search for ambiguities to depth 3...
 Nonterminal type_arguments is not ambiguous
 Nonterminal member_value_pair is not ambiguous
*** Search for ambiguities to depth 4...
 Nonterminal array_creation_no_initialization is not ambiguous
 Nonterminal array_creation_with_initialization_header is not ambiguous
*** Search for ambiguities to depth 5...
 Nonterminal compilation_unit is not ambiguous
 Nonterminal nested_class_declaration is not ambiguous
 Nonterminal interface_header is not ambiguous
*** Search for ambiguities to depth 6...
 Nonterminal name is not ambiguous
 Nonterminal normal_annotation is not ambiguous
 Nonterminal type_parameter is not ambiguous
*** Search for ambiguities to depth 7...
 Nonterminal enum_constant is not ambiguous
 Nonterminal bound is not ambiguous
*** Search for ambiguities to depth 8...
 Nonterminal annotation is not ambiguous
 Nonterminal type is not ambiguous
 Nonterminal catch_statements is not ambiguous
 Nonterminal value_suffix is not ambiguous

Ambiguous Rules:
method_reference = type '::' type_arguments IDENTIFIER ; SemanticCopy2
method_reference = primary '::' type_arguments IDENTIFIER ; SemanticCopy2
Instance: < IDENTIFIER '::' type_arguments IDENTIFIER >
Derivation:
 1: < type '::' type_arguments IDENTIFIER >
< primary '::' type_arguments IDENTIFIER >
 2: < type >
< primary >
 3: < name brackets >
< primary >
 4: < annotations_opt IDENTIFIER type_arguments brackets >
< primary >
 5: < IDENTIFIER type_arguments brackets >
< primary >
 6: < IDENTIFIER type_arguments brackets >
< primary_not_new_array >
 7: < IDENTIFIER type_arguments brackets >
< IDENTIFIER >
 8: < type_arguments brackets >
< >
*** Search for ambiguities to depth 9...
 Nonterminal enum_constants is not ambiguous
 Nonterminal type_argument is not ambiguous
*** Search for ambiguities to depth 10...
 Nonterminal parameter is not ambiguous
*** Search for ambiguities to depth 11...
 Nonterminal class_header is not ambiguous
 Nonterminal nested_interface_declaration is not ambiguous
*** Search for ambiguities to depth 12...
 Nonterminal import_statement is not ambiguous
 Nonterminal type_declaration is not ambiguous
 Nonterminal name_list is not ambiguous
 Nonterminal variable_declarator_list is not ambiguous
 Nonterminal formal_name_list is not ambiguous
*** Search for ambiguities to depth 13...
*** Search for ambiguities to depth 14...
 Nonterminal method_declaration is not ambiguous

This takes about 5 minutes to run because it is computing an exponentially growing set of instance strings. But we learn:

1) Java has the dangling else problem, too! (In our parsers we handle this by "prefer shift on 'else'" rule, which this ambiguity detector doesn't know about.

2) The grammar rule for method_reference is ambiguous. I think it is this way in the actual Java standard, too. This is actually handled in our parsers in the name resolver, by looking at the type of the IDENTIFIER.

Easy to talk about a tool like this but its a lot trickier to code it and have it handle big grammars. I've run a 3000 rule COBOL grammar through our tool and had it check some 480 billion different string expansions. Still don't know if the whole grammar is ambiguous or not. (It did catch silly stuff which we fixed).

Upvotes: 1

Related Questions