Reputation: 133
I'm trying to use RegEx within Python to parse out a function definition and NOTHING else. I keep running into problems though. Is RegEx the right tool to be using here?
i.e.
def foo():
print bar
-- Matches --
a = 2
def foo():
print bar
-- Doesn't match as there's code above the def --
def foo():
print bar
a = 2
-- Doesn't match as there's code below the def --
An example of a string I'm trying to parse is "def isPalindrome(x):\n return x == x[::-1]"
. But in reality that might contain lines above or below the def itself.
What RegEx expression would I have to use to achieve this?
Upvotes: 3
Views: 2092
Reputation: 27565
reg = re.compile('((^ *)def \w+\(.*?\): *\r?\n'
'(?: *\r?\n)*'
'\\2( +)[^ ].*\r?\n'
'(?: *\r?\n)*'
'(\\2\\3.*\r?\n(?: *\r?\n)*)*)',
re.MULTILINE)
EDIT
import re
script = '''
def foo():
print bar
a = 2
def foot():
print bar
b = 10
"""
opopo =457
def foor(x):
print bar
print x + 10
def g(u):
print
def h(rt,o):
assert(rt==12)
a = 2
class AZERT(object):
pass
"""
b = 10
def tabulae(x):
\tprint bar
\tprint x + 10
\tdef g(u):
\t\tprint
\tdef h(rt,o):
\t\tassert(rt==12)
a = 2
class Z:
def inzide(x):
print baracuda
print x + 10
def gululu(u):
print
def hortense(rt,o):
assert(rt==12)
def oneline(x): return 2*x
def scroutchibi(h%,n():245sqfg srot b#
'''
.
reg = re.compile('((?:^[ \t]*)def \w+\(.*\): *(?=.*?[^ \t\n]).*\r?\n)'
'|'
'((^[ \t]*)def \w+\(.*\): *\r?\n'
'(?:[ \t]*\r?\n)*'
'\\3([ \t]+)[^ \t].*\r?\n'
'(?:[ \t]*\r?\n)*'
'(\\3\\4.*\r?\n(?: *\r?\n)*)*)',
re.MULTILINE)
regcom = re.compile('("""|\'\'\')(.+?)\\1',re.DOTALL)
avoided_spans = [ma.span(2) for ma in regcom.finditer(script)]
print 'eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee'
for ma in reg.finditer(script):
print ma.group(),
print '--------------------'
print repr(ma.group())
print
try:
exec(ma.group().strip())
except:
print " isn't a valid definition of a function"
am,bm = ma.span()
if any(a<=am<=bm<=b for a,b in avoided_spans):
print ' is a commented definition function'
print 'eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee'
result
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
def foo():
print bar
--------------------
'def foo():\n print bar\n\n'
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
def foot():
print bar
--------------------
'def foot():\n print bar\n\n'
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
def foor(x):
print bar
print x + 10
def g(u):
print
def h(rt,o):
assert(rt==12)
--------------------
'def foor(x):\n\n\n print bar\n print x + 10\n def g(u):\n print\n\n def h(rt,o):\n assert(rt==12)\n'
is a commented definition function
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
def tabulae(x):
print bar
print x + 10
def g(u):
print
def h(rt,o):
assert(rt==12)
--------------------
'def tabulae(x):\n\n\n\tprint bar\n\tprint x + 10\n\tdef g(u):\n\t\tprint\n\n\tdef h(rt,o):\n\t\tassert(rt==12)\n'
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
def inzide(x):
print baracuda
print x + 10
def gululu(u):
print
def hortense(rt,o):
assert(rt==12)
--------------------
' def inzide(x):\n\n\n print baracuda\n print x + 10\n def gululu(u):\n print\n\n def hortense(rt,o):\n assert(rt==12)\n\n\n\n'
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
def oneline(x): return 2*x
--------------------
'def oneline(x): return 2*x\n'
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
def scroutchibi(h%,n():245sqfg srot b#
--------------------
'def scroutchibi(h%,n():245sqfg srot b#\n'
isn't a valid definition of a function
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
Upvotes: 2
Reputation: 57599
No, regular expressions are not the right tool for this job. This is similar to people desperately trying to parse HTML with regular expressions. These languages are not regular. Thus you can't work around all quirks you will encounter.
Use the built-in parser module, build a parse tree, check for definition nodes and use them instead. It's even better to use the ast
module as it is way more convenient to use. An example:
import ast
mdef = 'def foo(x): return 2*x'
a = ast.parse(mdef)
definitions = [n for n in ast.walk(a) if type(n) == ast.FunctionDef]
Upvotes: 9