Reputation: 4094
I have the following grammar for the setlx language in PLY:
Rule 0 S' -> file_input
Rule 1 file_input -> statement_list
Rule 2 epsilon -> <empty>
Rule 3 statement_list -> statement
Rule 4 statement_list -> statement_list statement
Rule 5 statement -> simple_statement SEMICOLON
Rule 6 statement -> compound_statement
Rule 7 simple_statement -> expression_statement
Rule 8 simple_statement -> assert_statement
Rule 9 simple_statement -> assignment_statement
Rule 10 simple_statement -> augmented_assign_statement
Rule 11 simple_statement -> backtrack_statement
Rule 12 simple_statement -> break_statement
Rule 13 simple_statement -> continue_statement
Rule 14 simple_statement -> exit_statement
Rule 15 simple_statement -> return_statement
Rule 16 simple_statement -> quantor
Rule 17 simple_statement -> term
Rule 18 expression_statement -> expression
Rule 19 backtrack_statement -> BACKTRACK
Rule 20 break_statement -> BREAK
Rule 21 continue_statement -> CONTINUE
Rule 22 exit_statement -> EXIT
Rule 23 return_statement -> RETURN
Rule 24 return_statement -> RETURN expression
Rule 25 expression_list -> expression
Rule 26 expression_list -> expression_list COMMA expression
Rule 27 expression -> implication
Rule 28 expression -> lambda_definition
Rule 29 expression -> implication EQUIVALENT implication
Rule 30 expression -> implication ANTIVALENT implication
Rule 31 implication -> disjunction
Rule 32 implication -> disjunction IMPLICATES disjunction
Rule 33 disjunction -> conjunction
Rule 34 disjunction -> disjunction OR conjunction
Rule 35 conjunction -> comparison
Rule 36 conjunction -> conjunction AND comparison
Rule 37 comparison -> sum
Rule 38 comparison -> sum EQ sum
Rule 39 comparison -> sum NEQ sum
Rule 40 comparison -> sum LT sum
Rule 41 comparison -> sum LE sum
Rule 42 comparison -> sum GT sum
Rule 43 comparison -> sum GE sum
Rule 44 comparison -> sum IN sum
Rule 45 comparison -> sum NOTIN sum
Rule 46 sum -> product
Rule 47 sum -> sum PLUS product
Rule 48 sum -> sum MINUS product
Rule 49 product -> reduce
Rule 50 product -> product TIMES reduce
Rule 51 product -> product DIVIDE reduce
Rule 52 product -> product IDIVIDE reduce
Rule 53 product -> product MOD reduce
Rule 54 product -> product CARTESIAN reduce
Rule 55 reduce -> unary_expression
Rule 56 reduce -> reduce SUM unary_expression
Rule 57 reduce -> reduce PRODUCT unary_expression
Rule 58 unary_expression -> power
Rule 59 unary_expression -> SUM unary_expression
Rule 60 unary_expression -> PRODUCT unary_expression
Rule 61 unary_expression -> HASH unary_expression
Rule 62 unary_expression -> MINUS unary_expression
Rule 63 unary_expression -> AT unary_expression
Rule 64 unary_expression -> BANG unary_expression
Rule 65 power -> primary
Rule 66 power -> primary POW unary_expression
Rule 67 primary -> atom
Rule 68 primary -> attributeref
Rule 69 primary -> subscription
Rule 70 primary -> slicing
Rule 71 primary -> procedure
Rule 72 primary -> call
Rule 73 primary -> primary BANG
Rule 74 atom -> identifier
Rule 75 atom -> literal
Rule 76 atom -> enclosure
Rule 77 identifier -> IDENTIFIER
Rule 78 identifier -> UNUSED
Rule 79 attributeref -> primary DOT identifier
Rule 80 subscription -> primary LBRACKET expression RBRACKET
Rule 81 slicing -> primary LBRACKET lower_bound RANGE upper_bound RBRACKET
Rule 82 lower_bound -> expression
Rule 83 lower_bound -> epsilon
Rule 84 upper_bound -> expression
Rule 85 upper_bound -> epsilon
Rule 86 literal -> stringliteral
Rule 87 literal -> integer
Rule 88 literal -> floatnumber
Rule 89 literal -> boolean
Rule 90 stringliteral -> STRING
Rule 91 stringliteral -> LITERAL
Rule 92 integer -> INTEGER
Rule 93 floatnumber -> DOUBLE
Rule 94 boolean -> TRUE
Rule 95 boolean -> FALSE
Rule 96 enclosure -> parenth_form
Rule 97 enclosure -> set_display
Rule 98 enclosure -> list_display
Rule 99 parenth_form -> LPAREN expression RPAREN
Rule 100 set_display -> LBRACE expression RANGE expression RBRACE
Rule 101 set_display -> LBRACE expression COMMA expression RANGE expression RBRACE
Rule 102 set_display -> LPAREN argument_list RPAREN
Rule 103 list_display -> LBRACKET expression RANGE expression RBRACKET
Rule 104 list_display -> LBRACKET expression COMMA expression RANGE expression RBRACKET
Rule 105 list_display -> LBRACKET argument_list RBRACKET
Rule 106 lambda_definition -> lambda_parameters LAMBDADEF expression
Rule 107 lambda_parameters -> identifier
Rule 108 lambda_parameters -> LT parameter_list GT
Rule 109 assignment_statement -> target ASSIGN expression
Rule 110 target -> expression
Rule 111 augmented_assign_statement -> augtarget augop expression
Rule 112 augtarget -> identifier
Rule 113 augtarget -> attributeref
Rule 114 augtarget -> subscription
Rule 115 augop -> PLUS_EQUAL
Rule 116 augop -> MINUS_EQUAL
Rule 117 augop -> TIMES_EQUAL
Rule 118 augop -> DIVIDE_EQUAL
Rule 119 augop -> IDIVIDE_EQUAL
Rule 120 augop -> MOD_EQUAL
Rule 121 assert_statement -> ASSERT LPAREN expression COMMA expression RPAREN
Rule 122 term -> TERM LPAREN term_arguments RPAREN
Rule 123 term_arguments -> expression_list
Rule 124 term_arguments -> epsilon
Rule 125 procedure -> PROCEDURE LPAREN parameter_list RPAREN LBRACE block RBRACE
Rule 126 procedure -> CPROCEDURE LPAREN parameter_list RPAREN LBRACE block RBRACE
Rule 127 parameter_list -> procedure_param
Rule 128 parameter_list -> parameter_list COMMA procedure_param
Rule 129 parameter_list -> epsilon
Rule 130 procedure_param -> identifier
Rule 131 call -> primary LPAREN argument_list RPAREN
Rule 132 call -> primary LPAREN RPAREN
Rule 133 argument_list -> expression
Rule 134 argument_list -> argument_list COMMA expression
Rule 135 quantor -> FORALL LPAREN iterator_chain PIPE expression RPAREN
Rule 136 quantor -> EXISTS LPAREN iterator_chain PIPE expression RPAREN
Rule 137 iterator -> target IN expression
Rule 138 iterator_chain -> iterator
Rule 139 iterator_chain -> iterator_chain COMMA iterator
Rule 140 compound_statement -> if_statement
Rule 141 compound_statement -> switch_statement
Rule 142 compound_statement -> match_statement
Rule 143 compound_statement -> while_loop
Rule 144 compound_statement -> do_while_loop
Rule 145 compound_statement -> for_loop
Rule 146 block -> statement_list
Rule 147 block -> epsilon
Rule 148 if_statement -> IF LPAREN expression RPAREN LBRACE block RBRACE
Rule 149 if_statement -> IF LPAREN expression RPAREN LBRACE block RBRACE ELSE LBRACE block RBRACE
Rule 150 if_statement -> IF LPAREN expression RPAREN LBRACE block RBRACE ELSE if_statement
Rule 151 switch_statement -> SWITCH LBRACE case_statements default_case RBRACE
Rule 152 case_statements -> case_list
Rule 153 case_statements -> epsilon
Rule 154 case_list -> case_statement
Rule 155 case_list -> case_list case_statement
Rule 156 case_statement -> CASE expression COLON block
Rule 157 default_case -> DEFAULT COLON block
Rule 158 default_case -> epsilon
Rule 159 match_statement -> MATCH
Rule 160 while_loop -> WHILE LPAREN expression RPAREN LBRACE block RBRACE
Rule 161 do_while_loop -> DO LBRACE block RBRACE WHILE LPAREN expression RPAREN SEMICOLON
Rule 162 for_loop -> FOR LPAREN iterator_chain RPAREN LBRACE block RBRACE
On the last few meters, I now get some conflicts:
WARNING:
WARNING: Conflicts:
WARNING:
WARNING: shift/reduce conflict for IN in state 34 resolved as shift
WARNING: shift/reduce conflict for COMMA in state 94 resolved as shift
WARNING: shift/reduce conflict for RPAREN in state 154 resolved as shift
How can I resolve them without generating new conflicts? I understand where they come from, but I have no idea about fixing it. Any help or general advice is appriciated.
Upvotes: 1
Views: 4027
Reputation: 241721
I'll do these backwards, because that way we go from easiest to hardest. In fact, I don't really have a solution for the first conflict.
The third conflict is the result of an actual ambiguity in the grammar. You need to get rid of the ambiguity:
Rule 96 enclosure -> parenth_form
Rule 97 enclosure -> set_display
Rule 99 parenth_form -> LPAREN expression RPAREN
Rule 102 set_display -> LPAREN argument_list RPAREN
Rule 133 argument_list -> expression
Consequently, if we're looking for an enclosure
and we find a simple parenthesized expression, it could either be a parenth_form
or it could be a set_display
containing an argument_list
of exactly one expression. I suspect that the intention here is that a simple parenthesized expression would be a parenth_form
, but there's no way to tell from the grammar.
The simplest solution would be to just get rid of parenth_form
altogether, and check for the case of a one-element argument_list
when you build the AST node for a set_display
corresponding to rule 102. Another possibility is to be explicit about it; change Rule 102 to require the set_display
to have at least two expressions:
set_display -> LPAREN expression COMMA argument_list RPAREN
That still requires you to juggle the AST, though, because you have to prepend the expression
to the argument_list
when you build the set_display
node.
The second S/R conflict is actually quite similar. It arises because of:
Rule 104 list_display -> LBRACKET expression COMMA expression RANGE expression RBRACKET
Rule 105 list_display -> LBRACKET argument_list RBRACKET
So:
LBRACKET expression COMMA expression ...
would require reduction by Rule 104 if the following symbol is RANGE
; by Rule 105 if the following symbol is RBRACKET
; and by Rule 134 if the following symbol is COMMA
. (That's a rough approximation, since it assumes that we already know the end of the second expression
.) As written, though, the grammar needs to commit to one of these paths as soon as it sees the first COMMA
, because it needs to decide at that moment whether to create an argument_list
or not.
The solution is to delay the parser's decision, which is easy but ugly:
list_display -> LBRACKET expression RANGE expression RBRACKET
list_display -> LBRACKET expression COMMA expression RANGE expression RBRACKET
list_display -> LBRACKET expression RBRACKET
list_display -> LBRACKET expression COMMA argument_list RBRACKET
Now, the first COMMA
is always shifted and the decision on what type of list_display
to reduce is delayed until the end of the second expression
(if there are two expression
s), but it's necessary to juggle the AST for the last two productions to correct the argument_list
.
The first S/R conflict arises because IN
is used both as an operator and as a syntactic part of an iterator
:
Rule 44 comparison -> sum IN sum
Rule 137 iterator -> target IN expression
But because target
is just an expression
, and expression
can derive sum
, it's not possible (most of the time) for the parser to know which IN
it's looking at until much later in the parse.
The previous technique of deferring the decision won't work here, because you need to know which type of IN
you're looking at in order to correctly apply operator precedence. Suppose we're in a context where we need an iterator
and the input is:
atom1 AND atom2 IN atom3
If that's the iterator (i.e., the next symbol is COMMA
or RPAREN
), then that is, effectively:
( atom1 AND atom2 ) IN atom3
However, if that's the left-hand side of an iterator, then it needs to be parsed completely differently:
( atom1 AND ( atom2 IN atom3 ) ) IN expression
Moreover, atom3
could have been an arbitrary expression, perhaps atom3 AND atom4
, leading to the two parses:
( atom1 AND atom2 ) IN ( atom3 AND atom4 )
( atom1 AND ( atom2 IN atom3 ) AND atom4 ) IN expression
This is why puns are bad in language design.
I strongly suspect that there is no LR(k)
grammar which will be able to parse that particular corner of your language, although that's just based on intuition; I have no proof. However, a GLR parser would have no trouble with it, because it is not actually ambiguous. I don't know if there is a GLR parser generator in Python; if you're not tied to Python, you could certainly use bison
.
The GLR parser would also have solved the second conflict, which is also not the result of an ambiguity.
Upvotes: 5