Reputation: 65
I'm looking to find and modify some sql syntax around the convert function. I want basically any convert(A,B) or CONVERT(A,B) in all my files to be selected and converted to B::A.
So far I tried selecting them with re.findall(r"\bconvert\b\(.*?,.*\)", l, re.IGNORECASE)
But it's only returning a small selection out of what I want and I also have trouble actually manipulating the A/B I mentioned.
For example, a sample line (note the nested structure here is irrelevant, I'm only getting the outer layer working if possible)
convert(varchar, '/' || convert(nvarchar, es.Item_ID) || ':' || convert(nvarchar, o.Option_Number) || '/') as LocPath
...should become...
'/' || es.Item_ID::nvarchar || ':' || o.Option_Number::nvarchar || '/' :: varchar as LocPath
Example2:
SELECT LocationID AS ItemId, convert(bigint, -1),
...should become...
SELECT LocationID AS ItemId, -1::bigint,
I think this should be possible with some kind of re.sub with groups and currently have a code structure inside a for each loop where line is the each line in the file:
matchConvert = ["convert(", "CONVERT("]
a = next((a for a in matchConvert if a in line), False)
if a:
print("convert() line")
#line = re.sub(re.escape(a) + r'', '', line)
Edit: In the end I went with a non re solution and handled each line by identifying each block and manipulate them accordingly.
Upvotes: 2
Views: 224
Reputation: 65
Here's my solution based on @Иван-Балван's code. Breaking this structure into blocks makes further specification a lot easier than I previously thought and I'll be using this method for a lot of other operations as well.
# Check for balanced brackets
def checkBracket(my_string):
count = 0
for c in my_string:
if c == "(":
count+=1
elif c == ")":
count-=1
return count
# Modify the first convert in line
# Based on suggestions from stackoverflow.com/questions/73040953
def modifyConvert(l):
# find the location of convert()
count = l.index('convert(')
# select the group before convert() call
before = l[:count]
group=""
n1=0
n2=0
A=""
B=""
operate = False
operators = ["|", "<", ">", "="]
# look for A group before comma
for n1, i in enumerate(l[count+8:], start=len(before)+8):
# find current position in l
checkIndex = checkBracket(l[count+8:][:n1-len(before)-8])
if i == ',' and checkIndex == 0:
A = group
break
group += i
# look for B group after comma
group = ""
for n2, i in enumerate(l[n1+1:], start=n1+1):
checkIndex = checkBracket(l[count+n1-len(before):][:n2-n1+1])
if i == ',' and checkIndex == 0:
return l
elif checkIndex < 0:
B = group
break
group += i
# mark operators
if i in operators:
operate = True
# select the group after convert() call
after = l[n2+1:]
# (B) if it contains operators
if operate:
return before + "(" + B.lstrip() + ') :: ' + A + after
else:
return before + B.lstrip() + '::' + A + after
# Modify cast syntax with convert(a,b). return line.
def convertCast(l):
# Call helper for nested cases
i = l.count('convert(')
while i>0:
i -= 1
l = modifyConvert(l)
return l
Upvotes: 0
Reputation: 598
The task:
Swap the parameters of all the 'convert' functions in the given string. Parameters can contain any character, including nested 'convert' functions.
A solution based on the re
module:
def convert_re(s):
import re
start,part_1,part_2,end=re.search(r'''
(.*?)
convert\(
([^,)(]+\(.+?\)[^,)(]*|[^,)(]+)
,
([^,)(]+\(.+?\)[^,)(]*|[^,)(]+)
\)
(.*)
''',s,re.X).groups()
result=start+part_2.lstrip()+' :: '+part_1+end
return result
def multi_convert_re(s):
converts=s.count('convert')
for n in range(converts):
s=convert_re(s)
return s
Discription of the 'convert_re' function:
Regular expression:
start is the first group with what comes before 'convert'
Then follows convert\()
which has no group and contains the name of the function and the opening '('
part_1 is the second group ([^,)(]+\(.+?\)[^,)(]*|[^,)(]+)
. This should match the first parameter. It can be anything except - ,)(
, or a function preceded by anything except ,)(
, optionally followed by anything except ,)(
and with anything inside (except a new line)
Then follows a comma ,
, which has no group
part_2 is the third group and it acts like the second, but should catch everything what's left inside the external function
Then follows )
, which has no group
end is the fourth group (.*)
with what's left before the new line.
The resulting string is then created by swapping part_1
and part_2
, putting ' :: ' between them, removing spaces on the left from part_2
and adding start
to the beginning and end
to the end.
Description of the 'multi_convert_re' function
Repeatedly calls 'convert_re' function until there are no "convert" left.
Notes:
re
module. It may turn out that the results will be different.Upvotes: 1
Reputation: 598
The task:
Swap the parameters of all the 'convert' functions in this given. Parameters can contain any character, including nested 'convert' functions.
A solution:
def convert_py(s):
#capturing start:
left=s.index('convert')
start=s[:left]
#capturing part_1:
c=0
line=''
for n1,i in enumerate(s[left+8:],start=len(start)+8):
if i==',' and c==0:
part_1=line
break
if i==')':
c-=1
if i=='(':
c+=1
line+=i
#capturing part_2:
c=0
line=''
for n2,i in enumerate(s[n1+1:],start=n1+1):
if i==')':
c-=1
if i=='(':
c+=1
if c<0:
part_2=line
break
line+=i
#capturing end:
end=s[n2+1:]
#capturing result:
result=start+part_2.lstrip()+' :: '+part_1+end
return result
def multi_convert_py(s):
converts=s.count('convert')
for n in range(converts):
s=convert_py(s)
return s
Notes:
re
module, which is presented in another answer - this version should not fail if there are more than two parameters in the 'convert' function in the given string. However, it will swap them only once, for example: convert(a,b, c)
--> b, c : a
Upvotes: 1
Reputation: 434
This may be an X/Y problem, meaning you’re asking how to do something with Regex that may be better solved with parsing (meaning using/modifying/writing a SQL parser). An indication that this is the case is the fact that “convert” calls can be nested. I’m guessing Regex is going to be more of a headache than it’s worth here in the long run if you’re working with a lot of files and they’re at all complicated.
Upvotes: 1