Reputation: 28149

Python extract info from current line or next line

I'm trying to use python to parse some sql scripts to extract names of tables and columns affected by a given query. The scripts have variations in format and some of them use this format:

Select col1, col2
FROM Table1

and other times it's like this:

 Select 
       col1, col2
    FROM 
       Table1

My code looks something like this:

tables1 = []
for line in file1:
   line2 = line.split()
   for i in xrange(len(line2)):
      if line2[i] = 'FROM':
         tables1.append(line2[i + 1])

Currently I'm parsing row by row but would need some kind of iterator to go to the next row if 'FROM' is the last string in the row. Any suggestions for how to do this?

Upvotes: 0

Answers (2)

M.javid

Reputation: 6637

By using re module this problem will be solve easily :

import re

def parse_query(query):
    res = re.match('select\s+(?P<cols>.+)\s+from\s+(?P<tbl>.+)\s*;?', query, re.I|re.M)
    if res is not None:
        cols = [c.strip() for c in res.groupdict()['cols'].split(',')]
        return {'columns': cols, 'table': res.groupdict()['tbl']}

print parse_info_from_query('''Select col1, col2
                               FROM Table1''')

print parse_info_from_query('''Select 
                                   col1, col2
                               FROM 
                                    Table1''')

Upvotes: 1

Henrik

Reputation: 2229

using sqlparse module.

go to https://pypi.python.org/pypi/sqlparse, and download the tar.gz and install:

> python setup.py install

then create the test file sqlExamples.sql:

Select col1, col2
FROM Table1;

 Select 
       col1, col2
    FROM 
        Table1
       ;
SELECT col3 FROM Table3;

then lets see if the parser can help us.. This is not very efficient script, it's written for learning:

import sqlparse

print "--------------------------------------------------------------------------"
print "loading the file into a string"
print "--------------------------------------------------------------------------"
with open ("sqlExamples.sql", "r") as myfile:
    sql = myfile.read()
print sql

print "--------------------------------------------------------------------------"
print "Example 1: using the parser to reformat SQL to a standardized format"
print "--------------------------------------------------------------------------"
formattedSQL = sqlparse.format(sql, reindent=True, keyword_case='upper')
print formattedSQL

print "--------------------------------------------------------------------------"
print "Example 1.A: reformatting statements, to single lines, for string analysis"
print "--------------------------------------------------------------------------"
words = " ".join(formattedSQL.split()).replace('; ', ';\n')
print words

print "--------------------------------------------------------------------------"
print "Example 2: using the parser more directly, to extract coloumns"
print "--------------------------------------------------------------------------"
parsed = sqlparse.parse(sql)
coloumns = []
tables = []
for SQL in parsed:
    #For my test cases, the name of the statement and the affected table is the same thing..
    if( SQL.get_name() not in tables):
        tables.append( SQL.get_name() )

    #for debugging print token list:
    for token in SQL.tokens:
        if token.is_whitespace():
            continue
        if "SELECT" in SQL.get_type() and token.is_group():
            cols = token.value.split(",")
            for col in cols:
                if col.strip() not in coloumns:
                    coloumns.append(col.strip())
            break

print "tables:" + str(tables)    
print "cols:" + str(coloumns)

Upvotes: 1

Python extract info from current line or next line

Answers (2)

Related Questions