Alchitry
Alchitry

Reputation: 1549

ANTLR Verilog @(*) matching two tokens

I am trying to use ANTLR4 to parse Verilog code. I am using the Verilog grammar found here https://github.com/antlr/grammars-v4/blob/master/verilog/Verilog2001.g4

The sample code is

module blinker(
        input clk,
        input rst,
        output blink
    );

    reg [24:0] counter_d, counter_q;

    assign blink = counter_q[24];

    always @(*) begin
        counter_d = counter_q + 1'b1;
    end

    always @(posedge clk) begin
        if (rst) begin
            counter_q <= 25'b0;
        end else begin
            counter_q <= counter_d;
        end
    end

endmodule

The problem is the line

always @(*) begin

The (*) is getting split into the tokens '(*' and ')'.

On line 723 of the grammar file there is

event_control :
'@' event_identifier
| '@' '(' event_expression ')'
| '@' '*'
| '@' '(' '*' ')'
;

Which should match the @(*) line if it wasn't for line 1329

attribute_instance : '(*' attr_spec ( ',' attr_spec )* '*)' ;

I'm new to all of this, but I'm guessing that the '(*' token from that line is matching the (* in the code and screwing things up.

After reading a bit from The Definitive ANTLR 4 Reference, I thought that the rule first defined would take precedence. However, I think that it's doing a greedy match?

Any ideas on how to fix the grammar?

Upvotes: 2

Views: 573

Answers (2)

Terence Parr
Terence Parr

Reputation: 5962

I just tweaked The grammar as Bart has suggested. It seems to parse. I also removed some extra optional braces that were causing warnings. please try to pull down and do it again. Ter

Upvotes: 1

Bart Kiers
Bart Kiers

Reputation: 170278

I'm new to all of this, but I'm guessing that the '(*' token from that line is matching the (* in the code and screwing things up.

You are correct.

After reading a bit from The Definitive ANTLR 4 Reference, I thought that the rule first defined would take precedence. However, I think that it's doing a greedy match?

Although defined in parser rules, the literal tokens are really lexer rules, which take precedence in the order they're defined only in case they match the same amount of characters. If a lexer rule can match more, it does so (as you observed).

I don't know any Verilog, but a quick fix for it would be to let the attribute_instance look like:

attribute_instance : '(' '*' attr_spec ( ',' attr_spec )* '*' ')' ;

However, if the lexer discards characters, like spaces, then the input "( *" (parenthesis, space, star) would also be matched as the start of a attribute_instance. If that is not desirable, you could let your event_control look like this:

event_control 
 : '@' event_identifier
 | '@' '(' event_expression ')'
 | '@' '*'
 | '@' ( '(' '*' | '(*' ) ')'
 ;

Note the ( '(' '*' | '(*' ) in the last alternative, which matches two single tokens, '(' and '*' (with possible spaces in between!), or the single token '(*'.

Upvotes: 2

Related Questions