dfrankow
dfrankow

Reputation: 21407

How to get table names from SQL using select_parser.py from pyparsing?

When I take select_parser.py from pyparsing, and run it with pyparsing 2.2.0 using this code:

query="select z.a, b from test_table left join test2_table where 1=1 and b in (select bb from foo)"
for key, val in select_stmt.parseString(query, parseAll=True).items():
    print "%s: %s" % (key, val)

I get

$ python select_parser.pyparsing.py
where_expr: [['1', '=', '1'], 'AND', ['b', 'IN', ['SELECT', [['bb']], 'FROM', 'foo']]]
from: [['test_table', ['LEFT', 'JOIN'], 'test2_table', []]]
columns: [['z.a'], ['b']]

Although there is a named element "table" in the original definition:

single_source = ( (Group(database_name("database") + "." + table_name("table*")) | table_name("table*")) + 

there is no dict key that comes out with the name "table".

Perhaps the "from" element consumed things first? I don't understand the exact logic of how the named elements get populated, and I haven't gotten a clear idea from reading the docs (several talks, etc.).

How can I use select_parser.py to get all the table names in a SQL query?

Note: the right answer here is a list (or set): test_table, test2_table, foo.

I could go through "from" and ditch lists, but that seems hacky, I don't know if it would work, and it doesn't seem like how pyparsing is supposed to work.

I see this question, this one, and this one, but I don't understand how they help here.

Upvotes: 0

Views: 934

Answers (1)

PaulMcG
PaulMcG

Reputation: 63739

The sample code you linked to includes a call to runTests. This is a great tool to try different test strings, and to write unit tests for your parser.

When inserting your query string in the call to runTests, we get this output:

select z.a, b from test_table left join test2_table where 1=1 and b in (select bb from foo)
['SELECT', [['z.a'], ['b']], 'FROM', ['test_table', ['LEFT', 'JOIN'], 'test2_table', []], 'WHERE', [['1', '=', '1'], 'AND', ['b', 'IN', ['SELECT', [['bb']], 'FROM', 'foo']]]]
- columns: [['z.a'], ['b']]
  [0]:
    ['z.a']
  [1]:
    ['b']
- from: [['test_table', ['LEFT', 'JOIN'], 'test2_table', []]]
  [0]:
    ['test_table', ['LEFT', 'JOIN'], 'test2_table', []]
    - table: [['test_table'], ['test2_table']]
      [0]:
        ['test_table']
      [1]:
        ['test2_table']
- where_expr: [['1', '=', '1'], 'AND', ['b', 'IN', ['SELECT', [['bb']], 'FROM', 'foo']]]
  [0]:
    ['1', '=', '1']
  [1]:
    AND
  [2]:
    ['b', 'IN', ['SELECT', [['bb']], 'FROM', 'foo']]
    [0]:
      b
    [1]:
      IN
    [2]:
      ['SELECT', [['bb']], 'FROM', 'foo']
      - columns: [['bb']]
        [0]:
          ['bb']
      - from: ['foo']
      - table: [['foo']]
        [0]:
          ['foo']

The 'table' names are there, but you'll have to do some navigation of the structure to get to them. Here is one way:

result = select_stmt.parseString(query)
table_names = []
def visit_structure(struct):
    if 'table' in struct:
        table_names.extend(t[0] for t in struct.table)
    for substruct in struct:
        if isinstance(substruct, ParseResults):
            visit_structure(substruct)

visit_structure(result)
print(table_names)

Gives:

['test_table', 'test2_table', 'foo']

For future listing of your parsed data, make use of the ParserElement.runTests and ParseResults.dump methods.

Upvotes: 1

Related Questions