samuelbrody1249
samuelbrody1249

Reputation: 4767

Whether to parse multiple statements at once or not

I noticed that in the MySQLParser.g4 file that it only handles one statement at a time:

query:
    EOF
    | (simpleStatement | beginWork) (SEMICOLON_SYMBOL EOF? | EOF)
;

simpleStatement:
    // DDL
    alterStatement
    | createStatement
    ...

Why is this choice made instead of parsing the entire file or script, which may include multiple SQL statments, such as:

CREATE TABLE...;
INSERT INTO ...;
INSERT INTO ...;
# could be thousands of statements here

Is this for efficiency so that the parser only handles one statement at a time so that it doesn't have to consume as much memory? Or basically, why is the choice made in the parser here to only do one statement at a time, and if that's the case, how would it parse multiple statements at once, for example in MySQL Workbench if I have these two statements:

enter image description here

Finally, for testing purposes, is this a good way to add a convenience method for debugging in IntelliJ, or how would this normally be done if the grammar only expects one statement at a time, and you want to, for example, test to make sure all ten statements are correct?

root
    : EOF
    // this line is for testing only
    | selectStatement (SEMICOLON selectStatement)* (SEMICOLON EOF? | EOF)
    // this line is for the actual parser
    | selectStatement (SEMICOLON EOF? | EOF)
    ;

Upvotes: 0

Views: 381

Answers (2)

Mike Lischke
Mike Lischke

Reputation: 53502

There are several arguments pro single-statement processing:

  • Killer Reason: Each statement can use a different delimiter (which you cannot handle in the parser grammar). Delimiters are not part of the SQL syntax.
  • In editors you will want to know where a statement starts and ends, without first parsing the full text (think of megabytes sized dumps), e.g. for executing a single statement.
  • You don't want to miss all following statement details, if only one statement contains a syntax error.
  • Parsing a single statement at a time gives you much better response times (e.g. when editing SQL code while it's still being parsed).
  • The server can process single statements only, anyway.

I implemented the statement handling in MySQL Workbench and did it the same way in MySQL Shell for VS Code. The statement splitter is usually very fast (100ms in C++ for a million statements, depending on the box it runs on). This allows to do a quick first run for the statement ranges, show the statement indicator and make the editor ready for statement requests (for execution). After that a background thread is used to parse the individual statements for errors, which can be stopped at any time when a statement was edited.

Upvotes: 2

Mike Cargal
Mike Cargal

Reputation: 6785

Whether to allow for multiple statements is pretty much just a grammar design choice. Depending upon the context, it might have been more straightforward to know you'll only see a single statement at a time, or that you can easily separate multiple statements and send each to a parser.

It does look like it would be useful for you.

A simpler version would be:

root : selectStatement? (SEMICOLON selectStatement)* SEMICOLON? EOF

you should always have an EOF

Another thing that doesn't always dawn on designers is that your grammar can have multiple start rules. So you could also have a

selectStart: selectStatement SEMICOLO? EOF;

rule that only allows for a single statement and depending upon you situation you can choose which start rule to use. I had a graphical tool for a language I wrote so sometimes I parsed exprs, sometimes stmts and sometimes scripts. Each had its own start rule. But don't forget to end a start rule with an EOF. This forces the parser to look at ALL of your input. Without it, it will parse as much as it can, but will ignore training input that doesn't fit a parse rule. (well, it's possible not to have EOF, IF you have a custom stream that remains open so there is no end of input. However, this would not be the case in your situation.)

Upvotes: 1

Related Questions