paranormaldist
paranormaldist

Reputation: 508

Python regex multi-line search between strings

a question of mine on a previous post here has expanded a bit so I'm asking a separate question for those who are familiar with regex.

I have a fairly complicated SQL query syntax where I'd like to find all instances of a string between two other strings in python format.

Simple version setup like this -

qry = ''' 
with 
qry_1 as ( 
   SELECT ID, 
          NAME
   FROM   ( ... other code...
),
qry_2 as ( 
    SELECT coalesce (table1.ID, table2.ID) as ID,
           NAME
   FROM (...other code...
),
qry_3 as (
     SELECT id.WEATHER AS WEATHER_MORN,
            ROW_NUMBER() OVER(PARTITION BY id.SUN
                ORDER BY id.TIME) AS SUN_TIME,
            id.RAIN,
            id.MIST
   FROM (...other code..
)
'''

Where I'd like to find text existing between as ( and from. I've tried a number of things that don't quite work.

Something like -

re.findall('\w+\s(?i)as\s\(nX(?i)from+', qry)

as part of a larger statement here that I had help with -

find_params = [re.findall('^\w+|(?:NAME|ID)|(?<=\.)(?:NAME|ID)', i) 
     for i in re.findall('\w+\s(?i)as\s\(nX(?i)from+', qry)]

Upvotes: 0

Views: 51

Answers (1)

The fourth bird
The fourth bird

Reputation: 163447

You could match from as and as least as possible lines until the next FROM and capture what is in between in a capture group.

^\S+\sas\s*\(\s*((?:\n.*)*?)\n\s*FROM\b

Another option is to match as and prevent crossing lines that start with either as as the second word (as in the example data) or start with FROM using a negative lookahead.

^\S+\sas\s*\([^\S\r\n]*((?:\n(?!\S+ as\b|\s*FROM\b).*)*)\n\s*FROM\b

Note that these patterns are based on the example data, and do not take possible nested structures into account.

The values of the capture group will be returned by re.findall for which you might do sub processing as in the question.

Upvotes: 1

Related Questions