Reputation: 508
a question of mine on a previous post here has expanded a bit so I'm asking a separate question for those who are familiar with regex
.
I have a fairly complicated SQL query syntax where I'd like to find all instances of a string between two other strings in python format.
Simple version setup like this -
qry = '''
with
qry_1 as (
SELECT ID,
NAME
FROM ( ... other code...
),
qry_2 as (
SELECT coalesce (table1.ID, table2.ID) as ID,
NAME
FROM (...other code...
),
qry_3 as (
SELECT id.WEATHER AS WEATHER_MORN,
ROW_NUMBER() OVER(PARTITION BY id.SUN
ORDER BY id.TIME) AS SUN_TIME,
id.RAIN,
id.MIST
FROM (...other code..
)
'''
Where I'd like to find text existing between as (
and from
. I've tried a number of things that don't quite work.
Something like -
re.findall('\w+\s(?i)as\s\(nX(?i)from+', qry)
as part of a larger statement here that I had help with -
find_params = [re.findall('^\w+|(?:NAME|ID)|(?<=\.)(?:NAME|ID)', i)
for i in re.findall('\w+\s(?i)as\s\(nX(?i)from+', qry)]
Upvotes: 0
Views: 51
Reputation: 163447
You could match from as
and as least as possible lines until the next FROM and capture what is in between in a capture group.
^\S+\sas\s*\(\s*((?:\n.*)*?)\n\s*FROM\b
Another option is to match as
and prevent crossing lines that start with either as
as the second word (as in the example data) or start with FROM
using a negative lookahead.
^\S+\sas\s*\([^\S\r\n]*((?:\n(?!\S+ as\b|\s*FROM\b).*)*)\n\s*FROM\b
Note that these patterns are based on the example data, and do not take possible nested structures into account.
The values of the capture group will be returned by re.findall for which you might do sub processing as in the question.
Regex demo | Python demo | |
---|---|---|
1 | https://regex101.com/r/xoRRBi/1 | https://ideone.com/NjmeKO |
2 | https://regex101.com/r/OZMH0A/1 | https://ideone.com/29RO51 |
Upvotes: 1