user2108462
user2108462

Reputation: 865

Are these Python programs for detecting the ambiguity of a finite grammar correct?

I've been doing Udacity CS262 and for the Detecting Ambiguity problem I'm not sure if my solution is correct and I'm not sure if the "official" solution is correct either.

Brief description of the problem: Write a function isambig(grammar, start, string) that takes a finite context free grammar (encoded as a python dictionary), the start symbol of the grammar, and a string. If there are two parse trees leading to the string, then the grammar is ambiguous (or at least that is my understanding of ambiguity - please correct me if I'm mistaken). If the grammar is ambiguous, return True. Else return False.

Test cases:

grammar1 = [
       ("S", [ "P", ]),
       ("S", [ "a", "Q", ]) ,
       ("P", [ "a", "T"]),
       ("P", [ "c" ]),
       ("Q", [ "b" ]),
       ("T", [ "b" ]),
       ]
print isambig(grammar1, "S", ["a", "b"]) == True
print isambig(grammar1, "S", ["c"]) == False

grammar2 = [
       ("A", [ "B", ]),
       ("B", [ "C", ]),
       ("C", [ "D", ]),
       ("D", [ "E", ]),
       ("E", [ "F", ]),
       ("E", [ "G", ]),
       ("E", [ "x", "H", ]),
       ("F", [ "x", "H"]),
       ("G", [ "x", "H"]),
       ("H", [ "y", ]),
       ]
print isambig(grammar2, "A", ["x", "y"]) == True
print isambig(grammar2, "E", ["y"]) == False

grammar3 = [ # Rivers in Kenya
       ("A", [ "B", "C"]),
       ("A", [ "D", ]),
       ("B", [ "Dawa", ]),
       ("C", [ "Gucha", ]),
       ("D", [ "B", "Gucha"]),
       ("A", [ "E", "Mbagathi"]),
       ("A", [ "F", "Nairobi"]),
       ("E", [ "Tsavo" ]),
       ("F", [ "Dawa", "Gucha" ])
       ]
print isambig(grammar3, "A", ["Dawa", "Gucha"]) == True
print isambig(grammar3, "A", ["Dawa", "Gucha", "Nairobi"]) == False
print isambig(grammar3, "A", ["Tsavo"]) == False

I have added my own test case. I'm not sure if this is correct, but I can only see possible one parse tree leading to the string "a b", so the string does not prove the grammar ambiguous. And I do not believe the grammar is ambiguous.

grammar4 = [ # Simple test case
       ("S", [ "P", "Q"]),
       ("P", [ "a", ]),
       ("Q", [ "b", ]),
       ]
print isambig(grammar4, "S", ["a", "b"]) == False

Here is the "official" program:

def expand(tokens_and_derivation, grammar):
    (tokens,derivation) = tokens_and_derivation
    for token_pos in range(len(tokens)):
        for rule_index in range(len(grammar)):
            rule = grammar[rule_index]
            if tokens[token_pos] == rule[0]:
                yield ((tokens[0:token_pos] + rule[1] + tokens[token_pos+1:]), derivation + [rule_index])

def isambig(grammar, start, utterance):
    enumerated = [([start], [])]
    while True:
        new_enumerated = enumerated
        for u in enumerated:
            for i in expand(u,grammar):
                if not i in new_enumerated:
                    new_enumerated = new_enumerated + [i]

        if new_enumerated != enumerated:
            enumerated = new_enumerated
        else:
            break
    result = [xrange for xrange in enumerated if xrange[0] == utterance]
    print result
    return len(result) > 1

Here is my own, much longer program:

def expand(grammar, symbol):
    result = []
    for rule in grammar:
        if rule[0] == symbol:
            result.append(rule[1])
    return result

def expand_first_nonterminal(grammar, string):
    result = []
    for i in xrange(len(string)):
        if isterminal(grammar, string[i]) == False:
            for j in expand(grammar, string[i]):
                result.append(string[:i]+j+string[i+1:])
            return result
    return None

def full_expand_string(grammar,string, result):
    for i in expand_first_nonterminal(grammar,string):
        if allterminals(grammar,i):
            result.append(i)
        else:
            full_expand_string(grammar,i,result)

def isterminal(grammar,symbol):
    for rule in grammar:
        if rule[0] == symbol:
            return False
    return True

def allterminals(grammar,string):
    for symbol in string:
        if isterminal(grammar,symbol) == False:
            return False
    return True

def returnall(grammar, start):
    result = []
    for rule in grammar:
        if rule[0] == start:
            if allterminals(grammar,rule[1]):
                return rule[1]
            else:
                full_expand_string(grammar, rule[1], result)
    return result

def isambig(grammar, start, utterance):
    count = 0
    for i in returnall(grammar,start):
        if i == utterance:
            count+=1
    if count > 1:
        return True
    else:
        return False

Now, my program passes all the test cases including the one I added (grammar4), but the official solution passes all test cases except the one I added. It seems to me that either the test case is wrong, or the official solution is wrong.

Is the official solution correct? Is my solution correct?

Upvotes: 5

Views: 1915

Answers (1)

segmentationfault
segmentationfault

Reputation: 518

To me it looks like grammar4 is not ambiguous. There is only one parse tree:

S -> PQ
P -> a
Q -> b

    S
    |
 ___|____
P        Q
|        |
a        b

However, the official program says that it is ambiguous because it uses the rules P -> a and Q -> b consecutively:

[(['a', 'b'], [0, 1, 2]), (['a', 'b'], [0, 2, 1])]

(Now there are two rule-sequences 0,1,2 and 0,2,1.)

So the "official" program seems to detect grammar4 incorrectly as being ambiguous.

UPDATE: I looked through your code and did a few tests and, apart from not handling recursion (official version doesn't handle recursion either), your program seems to correctly differentiate between ambiguous and unambiguous.

Simple test:

grammar5 = [ 
             ("S", ["A", "B"]),
             ("S", ["B", "A"]),
             ("A", ["a"]),
             ("B", ["a"]),
           ]   
print(isambig(grammar5, "S", ["a", "a"]))

S -> AB
S -> BA
A -> a
B -> a

    S
    |
 ___|____
A        B
|        |
a        a

    S
    |
 ___|____
B        A
|        |
a        a

Your version returns "ambiguous" (as does the "official" version.)

If you remove ("S", ["B", "A"]), your version correctly switches to "not ambiguous", whereas the other version still returns "ambiguous" (we're back at the grammar4 case.)

Maybe someone else (with a bit more experience than me) can chime in.

UPDATE 2: Ira Baxter mentioned that it is an undecidable problem whether a context-free grammar is ambiguous.

See also How is proving a context free language to be ambiguous undecidable?

Upvotes: 2

Related Questions