Reputation: 865
I've been doing Udacity CS262 and for the Detecting Ambiguity problem I'm not sure if my solution is correct and I'm not sure if the "official" solution is correct either.
Brief description of the problem: Write a function isambig(grammar, start, string) that takes a finite context free grammar (encoded as a python dictionary), the start symbol of the grammar, and a string. If there are two parse trees leading to the string, then the grammar is ambiguous (or at least that is my understanding of ambiguity - please correct me if I'm mistaken). If the grammar is ambiguous, return True. Else return False.
Test cases:
grammar1 = [
("S", [ "P", ]),
("S", [ "a", "Q", ]) ,
("P", [ "a", "T"]),
("P", [ "c" ]),
("Q", [ "b" ]),
("T", [ "b" ]),
]
print isambig(grammar1, "S", ["a", "b"]) == True
print isambig(grammar1, "S", ["c"]) == False
grammar2 = [
("A", [ "B", ]),
("B", [ "C", ]),
("C", [ "D", ]),
("D", [ "E", ]),
("E", [ "F", ]),
("E", [ "G", ]),
("E", [ "x", "H", ]),
("F", [ "x", "H"]),
("G", [ "x", "H"]),
("H", [ "y", ]),
]
print isambig(grammar2, "A", ["x", "y"]) == True
print isambig(grammar2, "E", ["y"]) == False
grammar3 = [ # Rivers in Kenya
("A", [ "B", "C"]),
("A", [ "D", ]),
("B", [ "Dawa", ]),
("C", [ "Gucha", ]),
("D", [ "B", "Gucha"]),
("A", [ "E", "Mbagathi"]),
("A", [ "F", "Nairobi"]),
("E", [ "Tsavo" ]),
("F", [ "Dawa", "Gucha" ])
]
print isambig(grammar3, "A", ["Dawa", "Gucha"]) == True
print isambig(grammar3, "A", ["Dawa", "Gucha", "Nairobi"]) == False
print isambig(grammar3, "A", ["Tsavo"]) == False
I have added my own test case. I'm not sure if this is correct, but I can only see possible one parse tree leading to the string "a b", so the string does not prove the grammar ambiguous. And I do not believe the grammar is ambiguous.
grammar4 = [ # Simple test case
("S", [ "P", "Q"]),
("P", [ "a", ]),
("Q", [ "b", ]),
]
print isambig(grammar4, "S", ["a", "b"]) == False
Here is the "official" program:
def expand(tokens_and_derivation, grammar):
(tokens,derivation) = tokens_and_derivation
for token_pos in range(len(tokens)):
for rule_index in range(len(grammar)):
rule = grammar[rule_index]
if tokens[token_pos] == rule[0]:
yield ((tokens[0:token_pos] + rule[1] + tokens[token_pos+1:]), derivation + [rule_index])
def isambig(grammar, start, utterance):
enumerated = [([start], [])]
while True:
new_enumerated = enumerated
for u in enumerated:
for i in expand(u,grammar):
if not i in new_enumerated:
new_enumerated = new_enumerated + [i]
if new_enumerated != enumerated:
enumerated = new_enumerated
else:
break
result = [xrange for xrange in enumerated if xrange[0] == utterance]
print result
return len(result) > 1
Here is my own, much longer program:
def expand(grammar, symbol):
result = []
for rule in grammar:
if rule[0] == symbol:
result.append(rule[1])
return result
def expand_first_nonterminal(grammar, string):
result = []
for i in xrange(len(string)):
if isterminal(grammar, string[i]) == False:
for j in expand(grammar, string[i]):
result.append(string[:i]+j+string[i+1:])
return result
return None
def full_expand_string(grammar,string, result):
for i in expand_first_nonterminal(grammar,string):
if allterminals(grammar,i):
result.append(i)
else:
full_expand_string(grammar,i,result)
def isterminal(grammar,symbol):
for rule in grammar:
if rule[0] == symbol:
return False
return True
def allterminals(grammar,string):
for symbol in string:
if isterminal(grammar,symbol) == False:
return False
return True
def returnall(grammar, start):
result = []
for rule in grammar:
if rule[0] == start:
if allterminals(grammar,rule[1]):
return rule[1]
else:
full_expand_string(grammar, rule[1], result)
return result
def isambig(grammar, start, utterance):
count = 0
for i in returnall(grammar,start):
if i == utterance:
count+=1
if count > 1:
return True
else:
return False
Now, my program passes all the test cases including the one I added (grammar4), but the official solution passes all test cases except the one I added. It seems to me that either the test case is wrong, or the official solution is wrong.
Is the official solution correct? Is my solution correct?
Upvotes: 5
Views: 1915
Reputation: 518
To me it looks like grammar4
is not ambiguous.
There is only one parse tree:
S -> PQ
P -> a
Q -> b
S
|
___|____
P Q
| |
a b
However, the official program says that it is ambiguous because it uses the rules
P -> a
and Q -> b
consecutively:
[(['a', 'b'], [0, 1, 2]), (['a', 'b'], [0, 2, 1])]
(Now there are two rule-sequences 0,1,2
and 0,2,1
.)
So the "official" program seems to detect grammar4
incorrectly as being ambiguous.
UPDATE: I looked through your code and did a few tests and, apart from not handling recursion (official version doesn't handle recursion either), your program seems to correctly differentiate between ambiguous and unambiguous.
Simple test:
grammar5 = [
("S", ["A", "B"]),
("S", ["B", "A"]),
("A", ["a"]),
("B", ["a"]),
]
print(isambig(grammar5, "S", ["a", "a"]))
S -> AB
S -> BA
A -> a
B -> a
S
|
___|____
A B
| |
a a
S
|
___|____
B A
| |
a a
Your version returns "ambiguous" (as does the "official" version.)
If you remove ("S", ["B", "A"])
, your version correctly
switches to "not ambiguous", whereas the other version still returns "ambiguous"
(we're back at the grammar4 case.)
Maybe someone else (with a bit more experience than me) can chime in.
UPDATE 2: Ira Baxter mentioned that it is an undecidable problem whether a context-free grammar is ambiguous.
See also How is proving a context free language to be ambiguous undecidable?
Upvotes: 2