Reputation: 276
I'm working on a piece of Python code using regular expressions and I'm trying to achieve the following:
If I have a piece of SQL as a string that contains some code in []
and a WHERE
clause in front of it, I want to remove the complete WHERE
clause. For example:
where this condition and [this = 1] group by 1,2,3
becomes
group by 1,2,3
The code I'm using is:
txt = """where [this = 1] group by"""
txt = re.sub("where.*\[.*\].*group" , 'group', txt, flags = re.S|re.I)
However, if I have another WHERE
clause before this one the whole regex doesn't work as expected, e.g.:
txt = """where that and that do that where [this = 1] group by 1,2,3"""
txt = re.sub("where.*\[.*\].*group" , 'group', txt, flags = re.S|re.I)
produces
group by 1,2,3
instead of
where that and that do that group by 1,2,3
Edit: the solution should also work for a scenario like this:
txt = """where that and that do that where [this = 1] and [that = 1] group by 1,2,3"""
outputs:
"""where that and that do that group by 1,2,3"""
So it removes the inner (closest to []
) WHERE
clause and all code containing at least one []
until the next GROUP
, ORDER
or end of string
.
Finally, the solution needs to takes care of cases in which there are multiple such where .. [...]
snippets in the string.
txt = """where [Include_Fakes|TRUE] group by 1 order by 1,3 ) where signed_up_date >= dateadd('[aggregation]', -[Last_N_Periods|12], CURRENT_DATE)"""
expected output:
group by 1 order by 1,3 )
Can somebody point me in the right direction here? Any help would be appreciated.
Upvotes: 1
Views: 83
Reputation: 12503
Here's a way to do that.
exp =r"(where((?!where).)*\[.*?\].*?(?=(group|order)))|(where((?!where).)*\[.*?\].*$)"
txt = """where that and that do that where [this = 1] and [that = 1] group by 1,2,3"""
print(re.sub(exp, "", txt))
# ==> where that and that do that group by 1,2,3
txt = """where that and that do that where [this = 1] group by 1,2,3"""
print(re.sub(exp, "", txt))
# ==> where that and that do that group by 1,2,3
txt = """lots of code where signed_up_date >= dateadd('[aggregation]', -[Last_N_Periods|12], CURRENT_DATE)"""
print(re.sub(exp, "", txt))
# ==> lots of code
txt = """where [Include_Fakes|TRUE] group by 1 order by 1,3 ) where signed_up_date >= dateadd('[aggregation]', -[Last_N_Periods|12], CURRENT_DATE)"""
print(re.sub(exp, "", txt))
# ==> group by 1 order by 1,3 )
txt = """where [condition1] group by 1) where [condition2] group by 2"""
print(re.sub(exp, "", txt))
# ==> group by 1) group by 2
Upvotes: 1
Reputation: 163362
You can match the last occurrence of where using a tempered greedy token and then match any char except [
or ]
.
Then repeat matching from an opening till closing [
]
and repeat that match 1+ times.
\bwhere(?:(?:(?!where)[^][])*\[[^][]*])+\s*
\bwhere
Word boundary and match where(?:
Non capture group
(?:
Non capture group
(?!where)[^][]
Match any char except [
or ]
if on the right is not where
)*
Close group and repeat 0+ times\[[^][]*]
Match 0+ times any char except [
or ]
)+
Close group and repeat 1+ times to match at least one time [
...]
\s*
Match 0+ whitespace charsExample code
import re
regex = r"\bwhere(?:(?:(?!where)[^][])*\[[^][]*])+\s*"
txt = ("where this condition and [this = 1] group by 1,2,3\n"
"where that and that do that where [this = 1] and [that = 1] group by 1,2,3")
result = re.sub(regex, "", txt)
print (result)
Output
group by 1,2,3
where that and that do that group by 1,2,3
Upvotes: 1
Reputation: 27283
You can use negative lookaheads to find the last possible match:
>>> import re
>>> re.sub(r"where((?!where).)*?]\s?", "", "where that and that do that where [this = 1] group by 1,2,3")
'where that and that do that group by 1,2,3'
>>> re.sub(r"where((?!where).)*?]\s?", "", "where this condition and [this = 1] group by 1,2,3")
'group by 1,2,3'
Upvotes: 3