Reputation: 1
I have developed a syntax checker for the Gerber format, using Tatsu. It works fine, my thanks to the Tatsu developers. However, it is not overly fast, and I am now optimizing the grammar.
The Gerber format is a stream of commands, and this is handled by main loop of the grammar is as follows:
start =
{
| ['X' integer] ['Y' integer] ((['I' integer 'J' integer] 'D01*')|'D02*'|'D03*')
| ('G01*'|'G02*'|'G03*'|'G1*'|'G2*'|'G3*')
... about 25 rules
}*
M02
$;
with integer = /[+-]?[0-9]+/;
In big files, where the performance is important, the vast majority of the statements are covered by the first rule in the choice. (It is actually three commands. By putting them first, and merging then to eliminate common elements made the checker 2-3 times faster.) Now I try to replace the first rule by a regex, assuming regex is faster as it is in C.
In the first step I inlined the integer:
| ['X' /[+-]?[0-9]+/] ['Y' /[+-]?[0-9]+/] ((['I' /[+-]?[0-9]+/ 'J' /[+-]?[0-9]+/] 'D01*')|'D02*'|'D03*')
This worked fine and gave a modest speedup.
Then I tried go regex the whole rule. Failure. As a test I only modified the first rule in the sequence:
| /(X[+-]?[0-9]+)?/ ['Y' /[+-]?[0-9]+/] ((['I' /[+-]?[0-9]+/ 'J' /[+-]?[0-9]+/] 'D01*')|'D02*'|'D03*')
This fails to recognize the following command: X81479571Y-38450761D01*
I cannot see the difference between ['X' /[+-]?[0-9]+/] and /(X[+-]?[0-9]+)?/
What do I miss?
Upvotes: 0
Views: 136
Reputation: 9244
The difference is that an optional expression with []
will advance over whitespace and comments while a pattern expression with //
will not. It's in the documentation. A trick for this case is to place the pattern in it's own, initial-lower-case rule, so there's whitespace and comments tokenization before applying the pattern, though I don't think adding that indirection will aid with performance.
As to optimization, a trick I've used in the "...25 more rules" case is to group rules with similar prefixes under a &lookahead
, for example &/G0/
in your case.
TatSu is designed to be friendly to grammar writers in favor of being performant. If you need blazing speeds, through generation of parsers in C, you may want to take a look at pegen, the predecesor to the new PEG parser in CPython.
Upvotes: 0