I have a problem replacing ebnf rules with regex in a Tatsu grammar

Question

I have developed a syntax checker for the Gerber format, using Tatsu. It works fine, my thanks to the Tatsu developers. However, it is not overly fast, and I am now optimizing the grammar.

The Gerber format is a stream of commands, and this is handled by main loop of the grammar is as follows:

start =

{
    | ['X' integer] ['Y' integer] ((['I' integer 'J' integer] 'D01*')|'D02*'|'D03*')
    | ('G01*'|'G02*'|'G03*'|'G1*'|'G2*'|'G3*')
    ... about 25 rules
}*
M02
$;

with integer = /[+-]?[0-9]+/;

In big files, where the performance is important, the vast majority of the statements are covered by the first rule in the choice. (It is actually three commands. By putting them first, and merging then to eliminate common elements made the checker 2-3 times faster.) Now I try to replace the first rule by a regex, assuming regex is faster as it is in C.

In the first step I inlined the integer:

    | ['X' /[+-]?[0-9]+/] ['Y' /[+-]?[0-9]+/] ((['I' /[+-]?[0-9]+/ 'J' /[+-]?[0-9]+/] 'D01*')|'D02*'|'D03*')

This worked fine and gave a modest speedup.

Then I tried go regex the whole rule. Failure. As a test I only modified the first rule in the sequence:

    | /(X[+-]?[0-9]+)?/ ['Y' /[+-]?[0-9]+/] ((['I' /[+-]?[0-9]+/ 'J' /[+-]?[0-9]+/] 'D01*')|'D02*'|'D03*')

This fails to recognize the following command: X81479571Y-38450761D01*

I cannot see the difference between ['X' /[+-]?[0-9]+/] and /(X[+-]?[0-9]+)?/

What do I miss?

I have a problem replacing ebnf rules with regex in a Tatsu grammar

Answers (1)

Related Questions