Reputation: 23
For the following C source code piece:
for (j=0; j<len; j++) a = (s) + (4); test = 5;
I want to insert \n
after semicolons ;
except in parenthesis using python code regex module.
For the following C source code piece:
for (j=0; j<(len); (j++)) a = (s) + (4); test = 5;
The regex ;(?![^(]*\))
works but not on the first piece of code.
Upvotes: 2
Views: 166
Reputation: 33
You need to count opened and closed brackets for each regex match and only insert the newline, if there are more openend than closed brackets. This is done in replacement() which is called on each match of the regex. The regex searches for "(" and ")" just for counting, and for ";" to leave it or insert newline
import re
def replacement(matched_list):
global bracket_count
matched_char=matched_list.group(1)
if "(" in matched_char:
bracket_count += 1
# don't replace, just return what was found
return matched_char
elif ")" in matched_char:
bracket_count -= 1
# don't replace, just return what was found
return matched_char
elif ";" in matched_char:
# if we're inside brackets, insert \n
if bracket_count == 0:
return ';\n'
# if not, leave it intact
else:
return ';'
# example 1
bracket_count=0
code="for (j=0; j<len; j++) a = (s) + (4); test = 5;"
new_code = re.sub('([();] ?)', replacement, code)
print(code)
print(new_code)
# example 2
bracket_count=0
code="for (j=0; j<(len); (j++)) a = (s) + (4); test = 5;"
new_code = re.sub('([();])', replacement, code)
print(code)
print(new_code)
# example 3
bracket_count=0
code="for (j=0; j<len; j++) test = 5; a = (s) + (4);"
new_code = re.sub('([();])', replacement, code)
print(code)
print(new_code)
Result:
for (j=0; j<len; j++) a = (s) + (4); test = 5;
for (j=0; j<len; j++) a = (s) + (4);
test = 5;
for (j=0; j<(len); (j++)) a = (s) + (4); test = 5;
for (j=0; j<(len); (j++)) a = (s) + (4);
test = 5;
Upvotes: 2
Reputation: 22457
Use a custom replacement function:
re.sub(pattern, repl, string, count=0, flags=0)
...
Ifrepl
is a function, it is called for every non-overlapping occurrence ofpattern
.
The function repl
is called for every occurrence of a single ;
and for parenthesized expressions. Since re.sub
does not find overlapping sequences, the very first opening parenthesis will trigger a full match all the way up to the last closing parenthesis.
import re
def repl(m):
contents = m.group(1)
if '(' in contents:
return contents
return ';\n'
str1 = 'for (j=0; j<len; j++) a = (s) + (4); test = 5;'
str2 = 'for (j=0; j<(len); (j++)) a = (s) + (4); test = 5;'
print (re.sub (r'(;\s*|\(.*\))', repl, str1))
print (re.sub (r'(;\s*|\(.*\))', repl, str2))
Result:
for (j=0; j<len; j++) a = (s) + (4);
test = 5;
for (j=0; j<(len); (j++)) a = (s) + (4);
test = 5;
Mission accomplished, for your (very little) sample data.
But wait!
A small – but valid – change in one of the examples
str1 = 'for (j=0; j<len; j++) test = 5; a = (s) + (4);'
breaks this with a wrong output:
for (j=0; j<len; j++) test = 5; a = (s) + (4);
There is no way around it, you need a state machine instead:
def state_match (text):
parentheses = 0
drop_space = False
result = ''
for character in text:
if character == '(':
parentheses += 1
result += '('
elif character == ')':
parentheses -= 1
result += ')'
elif character == ' ':
if not drop_space:
result += ' '
drop_space = False
elif character == ';':
if parentheses:
result += character
else:
result += ';\n'
drop_space = True
else:
result += character
return result
str1 = 'for (j=0; j<len; j++) a = (s) + (4); test = 5;'
str2 = 'for (j=0; j<(len); (j++)) a = (s) + (4); test = 5;'
str3 = 'for (j=0; j<len; j++) test = 5; a = (s) + (4);'
print (state_match(str1))
print (state_match(str2))
print (state_match(str3))
results correctly in:
for (j=0; j<len; j++) a = (s) + (4);
test = 5;
for (j=0; j<(len); (j++)) a = (s) + (4);
test = 5;
for (j=0; j<len; j++) test = 5;
a = (s) + (4);
Upvotes: 1