Reputation: 21407
When I take select_parser.py from pyparsing, and run it with pyparsing 2.2.0 using this code:
query="select z.a, b from test_table left join test2_table where 1=1 and b in (select bb from foo)"
for key, val in select_stmt.parseString(query, parseAll=True).items():
print "%s: %s" % (key, val)
I get
$ python select_parser.pyparsing.py
where_expr: [['1', '=', '1'], 'AND', ['b', 'IN', ['SELECT', [['bb']], 'FROM', 'foo']]]
from: [['test_table', ['LEFT', 'JOIN'], 'test2_table', []]]
columns: [['z.a'], ['b']]
Although there is a named element "table" in the original definition:
single_source = ( (Group(database_name("database") + "." + table_name("table*")) | table_name("table*")) +
there is no dict key that comes out with the name "table".
Perhaps the "from" element consumed things first? I don't understand the exact logic of how the named elements get populated, and I haven't gotten a clear idea from reading the docs (several talks, etc.).
How can I use select_parser.py to get all the table names in a SQL query?
Note: the right answer here is a list (or set): test_table, test2_table, foo.
I could go through "from" and ditch lists, but that seems hacky, I don't know if it would work, and it doesn't seem like how pyparsing is supposed to work.
I see this question, this one, and this one, but I don't understand how they help here.
Upvotes: 0
Views: 934
Reputation: 63739
The sample code you linked to includes a call to runTests
. This is a great tool to try different test strings, and to write unit tests for your parser.
When inserting your query string in the call to runTests
, we get this output:
select z.a, b from test_table left join test2_table where 1=1 and b in (select bb from foo)
['SELECT', [['z.a'], ['b']], 'FROM', ['test_table', ['LEFT', 'JOIN'], 'test2_table', []], 'WHERE', [['1', '=', '1'], 'AND', ['b', 'IN', ['SELECT', [['bb']], 'FROM', 'foo']]]]
- columns: [['z.a'], ['b']]
[0]:
['z.a']
[1]:
['b']
- from: [['test_table', ['LEFT', 'JOIN'], 'test2_table', []]]
[0]:
['test_table', ['LEFT', 'JOIN'], 'test2_table', []]
- table: [['test_table'], ['test2_table']]
[0]:
['test_table']
[1]:
['test2_table']
- where_expr: [['1', '=', '1'], 'AND', ['b', 'IN', ['SELECT', [['bb']], 'FROM', 'foo']]]
[0]:
['1', '=', '1']
[1]:
AND
[2]:
['b', 'IN', ['SELECT', [['bb']], 'FROM', 'foo']]
[0]:
b
[1]:
IN
[2]:
['SELECT', [['bb']], 'FROM', 'foo']
- columns: [['bb']]
[0]:
['bb']
- from: ['foo']
- table: [['foo']]
[0]:
['foo']
The 'table' names are there, but you'll have to do some navigation of the structure to get to them. Here is one way:
result = select_stmt.parseString(query)
table_names = []
def visit_structure(struct):
if 'table' in struct:
table_names.extend(t[0] for t in struct.table)
for substruct in struct:
if isinstance(substruct, ParseResults):
visit_structure(substruct)
visit_structure(result)
print(table_names)
Gives:
['test_table', 'test2_table', 'foo']
For future listing of your parsed data, make use of the ParserElement.runTests
and ParseResults.dump
methods.
Upvotes: 1