GenXeral
GenXeral

Reputation: 151

Regular Expression python sql statements

I'm writing a regular expression code to extract items from a SQL statement. I want to pull out a tuple of the items after SELECT * FROM and WHERE statements.

pattern = '''
    SELECT\s\*\sFROM\s ([a-zA-Z\s*\,*\=\.]+) \sWHERE \s*([a-zA-Z\d*\s*\,*\=*\.*]+)'''

From the pattern above, I'm using it for the SQL statement below, and get the following results:

('A, B, C\n           ', 'A.ColA = B.ColB AND B.ColB2=C.ColC')

I know it's because of the huge break before the WHERE, but how do I factor that in and get rid of the huge space and /n after my first tuple?

q = """SELECT * FROM A, B, C
            WHERE A.ColA = B.ColB AND B.ColB2=C.ColC"""

Upvotes: 0

Views: 54

Answers (1)

Ajax1234
Ajax1234

Reputation: 71451

You can loop over the returned result and use re.sub:

import re
results = ('A, B, C\n           ', 'A.ColA = B.ColB AND B.ColB2=C.ColC')
new_results = tuple(re.sub('\n\s+$', '', i) for i in results)

Output:

('A, B, C', 'A.ColA = B.ColB AND B.ColB2=C.ColC')

Or, you can alter your pattern:

import re
q = """SELECT * FROM A, B, C
        WHERE A.ColA = B.ColB AND B.ColB2=C.ColC"""
new_data = re.findall('(?<=SELECT \* FROM\s)[A-Z\s,]+(?=\n)|(?<=WHERE\s)[\w\W]+', q)

Output:

['A, B, C', 'A.ColA = B.ColB AND B.ColB2=C.ColC']

Upvotes: 1

Related Questions