Reputation: 1549
I am trying to use ANTLR4 to parse Verilog code. I am using the Verilog grammar found here https://github.com/antlr/grammars-v4/blob/master/verilog/Verilog2001.g4
The sample code is
module blinker(
input clk,
input rst,
output blink
);
reg [24:0] counter_d, counter_q;
assign blink = counter_q[24];
always @(*) begin
counter_d = counter_q + 1'b1;
end
always @(posedge clk) begin
if (rst) begin
counter_q <= 25'b0;
end else begin
counter_q <= counter_d;
end
end
endmodule
The problem is the line
always @(*) begin
The (*) is getting split into the tokens '(*' and ')'.
On line 723 of the grammar file there is
event_control :
'@' event_identifier
| '@' '(' event_expression ')'
| '@' '*'
| '@' '(' '*' ')'
;
Which should match the @(*) line if it wasn't for line 1329
attribute_instance : '(*' attr_spec ( ',' attr_spec )* '*)' ;
I'm new to all of this, but I'm guessing that the '(*' token from that line is matching the (* in the code and screwing things up.
After reading a bit from The Definitive ANTLR 4 Reference, I thought that the rule first defined would take precedence. However, I think that it's doing a greedy match?
Any ideas on how to fix the grammar?
Upvotes: 2
Views: 573
Reputation: 5962
I just tweaked The grammar as Bart has suggested. It seems to parse. I also removed some extra optional braces that were causing warnings. please try to pull down and do it again. Ter
Upvotes: 1
Reputation: 170278
I'm new to all of this, but I'm guessing that the
'(*'
token from that line is matching the(*
in the code and screwing things up.
You are correct.
After reading a bit from The Definitive ANTLR 4 Reference, I thought that the rule first defined would take precedence. However, I think that it's doing a greedy match?
Although defined in parser rules, the literal tokens are really lexer rules, which take precedence in the order they're defined only in case they match the same amount of characters. If a lexer rule can match more, it does so (as you observed).
I don't know any Verilog, but a quick fix for it would be to let the attribute_instance
look like:
attribute_instance : '(' '*' attr_spec ( ',' attr_spec )* '*' ')' ;
However, if the lexer discards characters, like spaces, then the input "( *"
(parenthesis, space, star) would also be matched as the start of a attribute_instance
. If that is not desirable, you could let your event_control
look like this:
event_control
: '@' event_identifier
| '@' '(' event_expression ')'
| '@' '*'
| '@' ( '(' '*' | '(*' ) ')'
;
Note the ( '(' '*' | '(*' )
in the last alternative, which matches two single tokens, '('
and '*'
(with possible spaces in between!), or the single token '(*'
.
Upvotes: 2