james rizkallah
james rizkallah

Reputation: 11

How can I extract table names from sub queries using sqlparse?

I'm trying to extract all the table names in a query using sqlparse but I'm having trouble with subqueries or insert statements wrapped in parenthesis.

I have the following query,

import sqlparse 
    
sql = """ 
          create test.test_table as (
                            select 1
                            from fake.table
                        );
        """
    

When I look for the token that included the statement in the parenthesis using

y = sqlparse.parse(sql)

for i in y[0].tokens:
    if isinstance(i, Identifier):
        print(i)
        print(i.get_real_name())

I get the following results,

test.test_table as (
                            select 1
                            from fake.table
                        )
test_table

The results come back as just one identifier token. When I attempt to get the table name from inside the parenthesis all the comes back is test.test_table. What I'm ultimately trying to do is extract both table names test.test_table and fake.table

Does anyone have any insight on how I can go about doing this?

Upvotes: 1

Views: 2035

Answers (1)

MrPaul91
MrPaul91

Reputation: 57

This may help, I've been working with Select SQL statements that contain subqueries in them, so usually the format has been:

Select blah from (select blah from table_name) alias

so firstly I ignore the first Select statement, and look for tokens that contain the word Select:

for item in parsed.tokens:

    if 'SELECT' in identifier.value.upper():
        subquery = identifier.value

The subquery will return

(select blah from table_name) alias

I then have a separate function which removes the outermost parentheses and the alias to give just the subquery script:

def subquery_parsing(subquery, full_tables, tables, alias):
    #print(subquery)
    #new subquery string ready to parse
    res_sub = """"""

    #captures the alias outside the parantheses
    alias = """"""
    
    #record the number of parentheses as they open and close
    paren_cnt = 0


    for char in subquery:
        #if ( and there's already been a ( , include it
        if char == '(' and paren_cnt > 0:
            res_sub += char
    
        #if (, add to the count
        if char == '(':
            paren_cnt += 1
   
        # if ) and there's at least 2 (, include it
        if char == ')' and paren_cnt > 1:
            res_sub += char
          
        # if ), subtract from the count        
        if char == ')':
            paren_cnt -= 1
    
        # capture the script
        if char != '(' and char != ')' and paren_cnt >0:
            res_sub += char
    
        # capture the alias
        if char != '(' and char != ')'  and char != ' ' and paren_cnt == 0:
            alias += char

this returns

select blah from table_name

Which you should then be able to run through the sqlparse.parse again and get the table name.

Upvotes: 1

Related Questions