Reputation: 315
I have extracted just the column fields from a query like this
query_split = [query[query.find("select") + len("select"): query.find("from")]]
I get a string like this
query_split = [' service,count(*) as count,round(sum(mrp),2) as sale ']
I want to get a list which looks like this
[' service','count(*) as count','round(sum(mrp),2) as sale']
This is because I want to get the list of column names
['service','count','sale']
I have tried other methods such as
for file in reader(query_split):
print(file)
Gives me the output
[' service', 'count(*) as count', 'round(sum(mrp)', '2) as sale ']
when i took the test case which uses round(sum(mrp),2) type operation in query the below function failed at that point
def get_column_name(query):
"""
Extracts the column name from a sql query
:param query: str
:return: column_name
list: Column names which that query will fetch
"""
column_name=[]
query_split = query[query.find("select") + len("select"): query.find("from")]
for i in query_split.split(','):
if "as" in i:
column_name.append(i.split('as')[-1])
else:
column_name.append(i.split(' ')[-1])
return column_name
Upvotes: 1
Views: 2343
Reputation: 18950
Your problem is that the SQL at play here features nested constructs.
The most likely cleanest solution is to have a SQL parser that understands the MySQL dialect. Arguably, it can be done most easily with ANTLR; you can find a MySQL grammar here and a quick guide here if you are curious.
To approach this with regex we need to account for balanced parenthesis with a recursive regex in a match pattern like this:
[^,]+(\((?>[^()]++|(?1))*+\))[^,]+|([^(),]+(?:,|$))
Explanation:
[^,]+(\((?>[^()]++|(?1))*+\))[^,]+
the recursive regex to match pairs of ()
and everything inbetween (including commas) sourounded by a negated character class that matches everything but a comma.([^(),]+(?:,|$))
matches regular columnsSample Code:
import regex as re
regex = r"[^,]+(\((?>[^()]++|(?1))*+\))[^,]+|([^(),]+(?:,|$))"
test_str = "service,count(*) as count,round(sum(mrp),2) as sale,count(*) as count2,round(sum(mrp),2) as sale2"
matches = re.finditer(regex, test_str, re.MULTILINE)
result = [match.group() for match in matches]
Outputs:
['service,', 'count(*) as count', 'round(sum(mrp),2) as sale', 'count(*) as count2', 'round(sum(mrp),2) as sale2']
Since we are using PCRE regex features you will need to install Python's alternative regex package to run the code. Good luck.
Upvotes: 2