Reputation: 359
I am trying to read values inside key values of SQL i.e. UPDATE, SET, FROM and WHERE. Using regex, i can get values until first line after where clause, but unable to get complete where clause. Please suggest.
Sample SQL from which value needs to be fetched.
UPDATE dbname.tablename
SET
alias1=T1.col1,
alias2=T1.col2
FROM datafabric_cfd_dea.T1 T1
WHERE
tablename.keycol =T1.keycol
AND tablename.col3='ABC'
AND T1.col3='ABC'
AND tablename.col3=T1.col3
AND T1.col2='XYZ';
Regex:
UPDATE\s*(.*)\s*SET\s*(.*)\s*FROM\s*(.*)\s*WHERE\s*(.*)
Please suggest.
Upvotes: 0
Views: 212
Reputation: 474221
I don't think regular expressions are the most appropriate tool for the job. As we are operating in the Python ecosystem we are lucky to have the luxury of using the power of the huge set of third-party packages on PyPI. There is a popular package called sqlparse
which implements an SQL parser.
You may parse your SQL statement and analyze the parsed tokens:
In [1]: import sqlparse
In [2]: sql = """ UPDATE dbname.tablename
...: SET
...: alias1=T1.col1,
...: alias2=T1.col2
...: FROM datafabric_cfd_dea.T1 T1
...: WHERE
...: tablename.keycol =T1.keycol
...: AND tablename.col3='ABC'
...: AND T1.col3='ABC'
...: AND tablename.col3=T1.col3
...: AND T1.col2='XYZ';"""
In [3]: parsed_sql = sqlparse.parse(sql)
# get the where clause
In [4]: where_clause = next(token for token in parsed_sql[0].tokens
if isinstance(token, sqlparse.sql.Where))
In [5]: where_clause.tokens
Out[5]:
[<Keyword 'WHERE' at 0x10FCD1A78>,
<Newline ' ' at 0x10FCD1668>,
<Whitespace ' ' at 0x10FCD1B48>,
<Comparison 'tablen...' at 0x10FCCDB50>,
<Newline ' ' at 0x10FCD1AE0>,
...
<Comparison 'T1.col...' at 0x10FCCDD50>,
<Punctuation ';' at 0x10FCF5EF0>]
Upvotes: 1