john_jerome
john_jerome

Reputation: 276

Replace the inner occurrence of an expression

I'm working on a piece of Python code using regular expressions and I'm trying to achieve the following:

If I have a piece of SQL as a string that contains some code in [] and a WHERE clause in front of it, I want to remove the complete WHERE clause. For example:

where this condition and [this = 1] group by 1,2,3

becomes

group by 1,2,3

The code I'm using is:

txt = """where [this = 1] group by"""

txt = re.sub("where.*\[.*\].*group" , 'group', txt, flags = re.S|re.I)    

However, if I have another WHERE clause before this one the whole regex doesn't work as expected, e.g.:

txt = """where that and that do that where [this = 1] group by 1,2,3"""

txt = re.sub("where.*\[.*\].*group" , 'group', txt, flags = re.S|re.I)  

produces

group by 1,2,3

instead of

where that and that do that group by 1,2,3

Edit: the solution should also work for a scenario like this:

txt = """where that and that do that where [this = 1] and [that = 1] group by 1,2,3"""

outputs:

"""where that and that do that group by 1,2,3"""

So it removes the inner (closest to []) WHERE clause and all code containing at least one [] until the next GROUP, ORDER or end of string.

Finally, the solution needs to takes care of cases in which there are multiple such where .. [...] snippets in the string.

txt = """where [Include_Fakes|TRUE] group by 1 order by 1,3 ) where signed_up_date >= dateadd('[aggregation]', -[Last_N_Periods|12], CURRENT_DATE)"""

expected output:
group by 1 order by 1,3 ) 

Can somebody point me in the right direction here? Any help would be appreciated.

Upvotes: 1

Views: 83

Answers (3)

Roy2012
Roy2012

Reputation: 12503

Here's a way to do that.

exp =r"(where((?!where).)*\[.*?\].*?(?=(group|order)))|(where((?!where).)*\[.*?\].*$)"

txt = """where that and that do that where [this = 1] and [that = 1] group by 1,2,3"""
print(re.sub(exp, "", txt))
# ==> where that and that do that group by 1,2,3

txt = """where that and that do that where [this = 1] group by 1,2,3"""
print(re.sub(exp, "", txt))
# ==> where that and that do that group by 1,2,3

txt = """lots of code where signed_up_date >= dateadd('[aggregation]', -[Last_N_Periods|12], CURRENT_DATE)""" 
print(re.sub(exp, "", txt))
# ==> lots of code 

txt = """where [Include_Fakes|TRUE] group by 1 order by 1,3 ) where signed_up_date >= dateadd('[aggregation]', -[Last_N_Periods|12], CURRENT_DATE)"""
print(re.sub(exp, "", txt))

# ==> group by 1 order by 1,3 ) 

txt =  """where [condition1] group by 1) where [condition2] group by 2"""
print(re.sub(exp, "", txt))

# ==> group by 1) group by 2

Upvotes: 1

The fourth bird
The fourth bird

Reputation: 163362

You can match the last occurrence of where using a tempered greedy token and then match any char except [ or ].

Then repeat matching from an opening till closing [ ] and repeat that match 1+ times.

\bwhere(?:(?:(?!where)[^][])*\[[^][]*])+\s*
  • \bwhere Word boundary and match where
  • (?: Non capture group
    • (?: Non capture group
      • (?!where)[^][] Match any char except [ or ] if on the right is not where
    • )* Close group and repeat 0+ times
    • \[[^][]*] Match 0+ times any char except [ or ]
  • )+ Close group and repeat 1+ times to match at least one time [...]
  • \s* Match 0+ whitespace chars

Regex demo | Python demo

Example code

import re
 
regex = r"\bwhere(?:(?:(?!where)[^][])*\[[^][]*])+\s*"
txt = ("where this condition and [this = 1] group by 1,2,3\n"
    "where that and that do that where [this = 1] and [that = 1] group by 1,2,3")
result = re.sub(regex, "", txt)
 
print (result)

Output

group by 1,2,3
where that and that do that group by 1,2,3

Upvotes: 1

L3viathan
L3viathan

Reputation: 27283

You can use negative lookaheads to find the last possible match:

>>> import re
>>> re.sub(r"where((?!where).)*?]\s?", "", "where that and that do that where [this = 1] group by 1,2,3")
'where that and that do that group by 1,2,3'
>>> re.sub(r"where((?!where).)*?]\s?", "", "where this condition and [this = 1] group by 1,2,3")
'group by 1,2,3'

Demo

Upvotes: 3

Related Questions