Reputation: 3225
I am looking for a SQL Library that will parse an SQL statement and return some sort of Object representation of the SQL statement. My main objective is actually to be able to parse the SQL statement and retrieve the list of table names present in the SQL statement (including subqueries, joins and unions).
I am looking for a free library with a license business friendly (e.g. Apache license). I am looking for a library and not for an SQL Grammar. I do not want to build my own parser.
The best I could find so far was JSQLParser, and the example they give is actually pretty close to what I am looking for. However it fails parsing too many good queries (DB2 Database) and I'm hoping to find a more reliable library.
Upvotes: 7
Views: 15520
Reputation: 2933
You need the ultra light, ultra fast library to extract table names from SQL (Disclaimer: I am the owner)
Just add the following in your pom
<dependency>
<groupId>com.github.mnadeem</groupId>
<artifactId>sql-table-name-parser</artifactId>
<version>0.0.1</version>
And do the following
new TableNameParser(sql).tables()
For more details, refer the project
Upvotes: 4
Reputation: 8406
Old question, but I think this project contains what you need:
Data Tools Project - SQL Development Tools
Here's the documentation for the SQL Query Parser.
Also, here's a small sample program. I'm no Java programmer so use with care.
package org.lala;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.util.Iterator;
import java.util.List;
import org.eclipse.datatools.modelbase.sql.query.QuerySelectStatement;
import org.eclipse.datatools.modelbase.sql.query.QueryStatement;
import org.eclipse.datatools.modelbase.sql.query.TableReference;
import org.eclipse.datatools.modelbase.sql.query.ValueExpressionColumn;
import org.eclipse.datatools.modelbase.sql.query.helper.StatementHelper;
import org.eclipse.datatools.sqltools.parsers.sql.SQLParseErrorInfo;
import org.eclipse.datatools.sqltools.parsers.sql.SQLParserException;
import org.eclipse.datatools.sqltools.parsers.sql.SQLParserInternalException;
import org.eclipse.datatools.sqltools.parsers.sql.query.SQLQueryParseResult;
import org.eclipse.datatools.sqltools.parsers.sql.query.SQLQueryParserManager;
import org.eclipse.datatools.sqltools.parsers.sql.query.SQLQueryParserManagerProvider;
public class SQLTest {
private static String readFile(String path) throws IOException {
FileInputStream stream = new FileInputStream(new File(path));
try {
FileChannel fc = stream.getChannel();
MappedByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0,
fc.size());
/* Instead of using default, pass in a decoder. */
return Charset.defaultCharset().decode(bb).toString();
} finally {
stream.close();
}
}
/**
* @param args
* @throws IOException
*/
public static void main(String[] args) throws IOException {
try {
// Create an instance the Parser Manager
// SQLQueryParserManagerProvider.getInstance().getParserManager
// returns the best compliant SQLQueryParserManager
// supporting the SQL dialect of the database described by the given
// database product information. In the code below null is passed
// for both the database and version
// in which case a generic parser is returned
SQLQueryParserManager parserManager = SQLQueryParserManagerProvider
.getInstance().getParserManager("DB2 UDB", "v9.1");
// Sample query
String sql = readFile("c:\\test.sql");
// Parse
SQLQueryParseResult parseResult = parserManager.parseQuery(sql);
// Get the Query Model object from the result
QueryStatement resultObject = parseResult.getQueryStatement();
// Get the SQL text
String parsedSQL = resultObject.getSQL();
System.out.println(parsedSQL);
// Here we have the SQL code parsed!
QuerySelectStatement querySelect = (QuerySelectStatement) parseResult
.getSQLStatement();
List columnExprList = StatementHelper
.getEffectiveResultColumns(querySelect);
Iterator columnIt = columnExprList.iterator();
while (columnIt.hasNext()) {
ValueExpressionColumn colExpr = (ValueExpressionColumn) columnIt
.next();
// DataType dataType = colExpr.getDataType();
System.out.println("effective result column: "
+ colExpr.getName());// + " with data type: " +
// dataType.getName());
}
List tableList = StatementHelper.getTablesForStatement(resultObject);
// List tableList = StatementHelper.getTablesForStatement(querySelect);
for (Object obj : tableList) {
TableReference t = (TableReference) obj;
System.out.println(t.getName());
}
} catch (SQLParserException spe) {
// handle the syntax error
System.out.println(spe.getMessage());
@SuppressWarnings("unchecked")
List<SQLParseErrorInfo> syntacticErrors = spe.getErrorInfoList();
Iterator<SQLParseErrorInfo> itr = syntacticErrors.iterator();
while (itr.hasNext()) {
SQLParseErrorInfo errorInfo = (SQLParseErrorInfo) itr.next();
// Example usage of the SQLParseErrorInfo object
// the error message
String errorMessage = errorInfo.getParserErrorMessage();
String expectedText = errorInfo.getExpectedText();
String errorSourceText = errorInfo.getErrorSourceText();
// the line numbers of error
int errorLine = errorInfo.getLineNumberStart();
int errorColumn = errorInfo.getColumnNumberStart();
System.err.println("Error in line " + errorLine + ", column "
+ errorColumn + ": " + expectedText + " "
+ errorMessage + " " + errorSourceText);
}
} catch (SQLParserInternalException spie) {
// handle the exception
System.out.println(spie.getMessage());
}
System.exit(0);
}
}
Upvotes: 0
Reputation: 74187
I doubt you'll find anything prewritten that you can just use. The problem is that ISO/ANSI SQL is a very complicated grammar — something like more than 600 production rules IIRC.
Terence Parr's ANTLR parser generator (Java, but can generate parsers in any one of a number of target languages) has several SQL grammars available, including a couple for PL/SQL, one for a SQL Server SELECT statement, one for mySQL, and one for ISO SQL.
No idea how complete/correct/up-to-date they are.
http://www.antlr.org/grammar/list
Upvotes: 6
Reputation: 51
You needn't reinvent the wheel, there is already such a reliable SQL parser library there, (it's commerical, not free), and this article shows how to retrieve the list of table names present in the SQL statement (including subqueries, joins and unions) that is exactly what you are looking for.
This SQL parser library supports Oracle, SQL Server, DB2, MySQL, Teradata and ACCESS.
Upvotes: 5