Reputation: 11
I'm trying to extract all the table names in a query using sqlparse but I'm having trouble with subqueries or insert statements wrapped in parenthesis.
I have the following query,
import sqlparse
sql = """
create test.test_table as (
select 1
from fake.table
);
"""
When I look for the token that included the statement in the parenthesis using
y = sqlparse.parse(sql)
for i in y[0].tokens:
if isinstance(i, Identifier):
print(i)
print(i.get_real_name())
I get the following results,
test.test_table as (
select 1
from fake.table
)
test_table
The results come back as just one identifier token. When I attempt to get the table name from inside the parenthesis all the comes back is test.test_table. What I'm ultimately trying to do is extract both table names test.test_table and fake.table
Does anyone have any insight on how I can go about doing this?
Upvotes: 1
Views: 2035
Reputation: 57
This may help, I've been working with Select SQL statements that contain subqueries in them, so usually the format has been:
Select blah from (select blah from table_name) alias
so firstly I ignore the first Select statement, and look for tokens that contain the word Select:
for item in parsed.tokens:
if 'SELECT' in identifier.value.upper():
subquery = identifier.value
The subquery will return
(select blah from table_name) alias
I then have a separate function which removes the outermost parentheses and the alias to give just the subquery script:
def subquery_parsing(subquery, full_tables, tables, alias):
#print(subquery)
#new subquery string ready to parse
res_sub = """"""
#captures the alias outside the parantheses
alias = """"""
#record the number of parentheses as they open and close
paren_cnt = 0
for char in subquery:
#if ( and there's already been a ( , include it
if char == '(' and paren_cnt > 0:
res_sub += char
#if (, add to the count
if char == '(':
paren_cnt += 1
# if ) and there's at least 2 (, include it
if char == ')' and paren_cnt > 1:
res_sub += char
# if ), subtract from the count
if char == ')':
paren_cnt -= 1
# capture the script
if char != '(' and char != ')' and paren_cnt >0:
res_sub += char
# capture the alias
if char != '(' and char != ')' and char != ' ' and paren_cnt == 0:
alias += char
this returns
select blah from table_name
Which you should then be able to run through the sqlparse.parse again and get the table name.
Upvotes: 1