Reputation: 28149
I'm trying to use python to parse some sql scripts to extract names of tables and columns affected by a given query. The scripts have variations in format and some of them use this format:
Select col1, col2
FROM Table1
and other times it's like this:
Select
col1, col2
FROM
Table1
My code looks something like this:
tables1 = []
for line in file1:
line2 = line.split()
for i in xrange(len(line2)):
if line2[i] = 'FROM':
tables1.append(line2[i + 1])
Currently I'm parsing row by row but would need some kind of iterator to go to the next row if 'FROM' is the last string in the row. Any suggestions for how to do this?
Upvotes: 0
Views: 422
Reputation: 6637
By using re
module this problem will be solve easily :
import re
def parse_query(query):
res = re.match('select\s+(?P<cols>.+)\s+from\s+(?P<tbl>.+)\s*;?', query, re.I|re.M)
if res is not None:
cols = [c.strip() for c in res.groupdict()['cols'].split(',')]
return {'columns': cols, 'table': res.groupdict()['tbl']}
print parse_info_from_query('''Select col1, col2
FROM Table1''')
print parse_info_from_query('''Select
col1, col2
FROM
Table1''')
Upvotes: 1
Reputation: 2229
using sqlparse module.
go to https://pypi.python.org/pypi/sqlparse, and download the tar.gz and install:
> python setup.py install
then create the test file sqlExamples.sql
:
Select col1, col2
FROM Table1;
Select
col1, col2
FROM
Table1
;
SELECT col3 FROM Table3;
then lets see if the parser can help us.. This is not very efficient script, it's written for learning:
import sqlparse
print "--------------------------------------------------------------------------"
print "loading the file into a string"
print "--------------------------------------------------------------------------"
with open ("sqlExamples.sql", "r") as myfile:
sql = myfile.read()
print sql
print "--------------------------------------------------------------------------"
print "Example 1: using the parser to reformat SQL to a standardized format"
print "--------------------------------------------------------------------------"
formattedSQL = sqlparse.format(sql, reindent=True, keyword_case='upper')
print formattedSQL
print "--------------------------------------------------------------------------"
print "Example 1.A: reformatting statements, to single lines, for string analysis"
print "--------------------------------------------------------------------------"
words = " ".join(formattedSQL.split()).replace('; ', ';\n')
print words
print "--------------------------------------------------------------------------"
print "Example 2: using the parser more directly, to extract coloumns"
print "--------------------------------------------------------------------------"
parsed = sqlparse.parse(sql)
coloumns = []
tables = []
for SQL in parsed:
#For my test cases, the name of the statement and the affected table is the same thing..
if( SQL.get_name() not in tables):
tables.append( SQL.get_name() )
#for debugging print token list:
for token in SQL.tokens:
if token.is_whitespace():
continue
if "SELECT" in SQL.get_type() and token.is_group():
cols = token.value.split(",")
for col in cols:
if col.strip() not in coloumns:
coloumns.append(col.strip())
break
print "tables:" + str(tables)
print "cols:" + str(coloumns)
Upvotes: 1