YabMas
YabMas

Reputation: 13

Best way to parse SQL statement

Context

We're trying to built a database-application that can help first-year students prepare for their SQL-exam by assessing if a SQL-statement is a solution for the given question. The basic version simply checks if the given answer exists in the database as a known correct answer, if not it is sent to the teacher who then adds it to the database or discards it. In the next iteration we would like the application to asses answers more in the way it's done at exams, so you can still get some points even if parts of your query/statement are wrong.

Goal

To get this done we need to be able to 'break up' a statement.

For example, the answer:

SELECT Movie.movie_id AS 'Movie ID', Movie.title AS 'Movie Title', COUNT(*) AS 'Nr of directors'  
FROM Movie  
INNER JOIN Movie_Director  
ON Movie.movie_id = Movie_Director.movie_id  
WHERE Movie.publication_year = 2003  
GROUP BY Movie.movie_id, Movie.title  
HAVING COUNT(*) > 1  
ORDER BY 3 DESC  

Should be broken down into all the different clauses (select/from/inner join/where/having/....)

There doesn't seem to be an easy way to add a table to show the desired output and my attempt to make one only makes it less clear. And really the exact output format doesn't matter that much. I hope the intention is clear.

Besides this type of query the application needs to be able to asses: CREATE/ALTER table (adding PK, FK or Check-constraints) and DELETE/INSERT/UPDATE types of statements.

What I found so far

Looking online for SQL-parsers gives plenty results, but all in different languages. Our applications needs to run entirely inside SQL Server. SQL Server seems to parse every query/statement, but it's not clear to me if the results of this are accessible and/or useful to me.

Questions

Upvotes: 0

Views: 4528

Answers (2)

lptr
lptr

Reputation: 6808

using a clr parser function, dbo.parseSqlToXml()

declare @sql Nvarchar(max)= N'
SELECT Movie.movie_id AS ''Movie ID'', Movie.title AS ''Movie Title'', COUNT(*) AS ''Nr of directors''  
FROM Movie  
INNER JOIN Movie_Director  
ON Movie.movie_id = Movie_Director.movie_id  
WHERE Movie.publication_year = 2003  
GROUP BY Movie.movie_id, Movie.title  
HAVING COUNT(*) > 1  
ORDER BY 3 DESC
';

select 
t.thexml.query('data(SqlScript/Errors/Error)').value('.', 'Nvarchar(max)') as _errors,
s.sel.value('comment()[1]', 'varchar(max)') as _query,
s.sel.value('(SqlSelectClause/comment())[1]', 'Nvarchar(max)') as _select,
s.sel.value('(SqlFromClause/comment())[1]', 'Nvarchar(max)') as _from,
s.sel.value('(SqlWhereClause/comment())[1]', 'Nvarchar(max)') as _where,
s.sel.value('(SqlGroupByClause/comment())[1]', 'Nvarchar(max)') as _groupby,
s.sel.value('(SqlHavingClause/comment())[1]', 'Nvarchar(max)') as _having,
s.sel.value('(../SqlOrderByClause/comment())[1]', 'Nvarchar(max)') as _orderby
from
(
select cast(dbo.parseSqlToXml(@sql) as xml) as thexml
) as t
cross apply t.thexml.nodes('//*[SqlSelectClause]') as s(sel);

...will return

+---------+----------------------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------+-------------------------------------+--------------------------------------+---------------------+-----------------+
| _errors |                                   _query                                   |                              _select                               |                   _from                    |               _where                |               _groupby               |       _having       |    _orderby     |
+---------+----------------------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------+-------------------------------------+--------------------------------------+---------------------+-----------------+
|         | SELECT Movie.movie_id AS 'Movie ID', Movie.title AS ...HAVING COUNT(*) > 1 | SELECT Movie.movie_id AS 'Movie ID', Movie... AS 'Nr of directors' | FROM Movie   INNER JOIN Movie_Director.... | WHERE Movie.publication_year = 2003 | GROUP BY Movie.movie_id, Movie.title | HAVING COUNT(*) > 1 | ORDER BY 3 DESC |
+---------+----------------------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------+-------------------------------------+--------------------------------------+---------------------+-----------------+

Upvotes: 1

Stefan
Stefan

Reputation: 25

For MSSQL Server you can use Microsoft.SqlServer.Management.SqlParser.Parser from Microsoft.SqlServer.Management.SqlParser.dll assembly.

the result of the parser will be array of tokens: https://learn.microsoft.com/en-us/dotnet/api/microsoft.sqlserver.management.sqlparser.parser.tokens?redirectedfrom=MSDN&view=sql-smo-150

Upvotes: 0

Related Questions