Reputation: 151
I'm writing a regular expression code to extract items from a SQL statement. I want to pull out a tuple of the items after SELECT * FROM
and WHERE
statements.
pattern = '''
SELECT\s\*\sFROM\s ([a-zA-Z\s*\,*\=\.]+) \sWHERE \s*([a-zA-Z\d*\s*\,*\=*\.*]+)'''
From the pattern above, I'm using it for the SQL statement below, and get the following results:
('A, B, C\n ', 'A.ColA = B.ColB AND B.ColB2=C.ColC')
I know it's because of the huge break before the WHERE
, but how do I factor that in and get rid of the huge space and /n
after my first tuple?
q = """SELECT * FROM A, B, C
WHERE A.ColA = B.ColB AND B.ColB2=C.ColC"""
Upvotes: 0
Views: 54
Reputation: 71451
You can loop over the returned result and use re.sub
:
import re
results = ('A, B, C\n ', 'A.ColA = B.ColB AND B.ColB2=C.ColC')
new_results = tuple(re.sub('\n\s+$', '', i) for i in results)
Output:
('A, B, C', 'A.ColA = B.ColB AND B.ColB2=C.ColC')
Or, you can alter your pattern:
import re
q = """SELECT * FROM A, B, C
WHERE A.ColA = B.ColB AND B.ColB2=C.ColC"""
new_data = re.findall('(?<=SELECT \* FROM\s)[A-Z\s,]+(?=\n)|(?<=WHERE\s)[\w\W]+', q)
Output:
['A, B, C', 'A.ColA = B.ColB AND B.ColB2=C.ColC']
Upvotes: 1