Strings
Strings

Reputation: 133

How do I match a python function definition (and nothing else) with RegEx?

I'm trying to use RegEx within Python to parse out a function definition and NOTHING else. I keep running into problems though. Is RegEx the right tool to be using here?

i.e.

def foo():
  print bar
-- Matches --

a = 2
def foo():
  print bar
-- Doesn't match as there's code above the def --

def foo():
  print bar
a = 2
-- Doesn't match as there's code below the def --

An example of a string I'm trying to parse is "def isPalindrome(x):\n return x == x[::-1]". But in reality that might contain lines above or below the def itself.

What RegEx expression would I have to use to achieve this?

Upvotes: 3

Views: 2092

Answers (2)

eyquem
eyquem

Reputation: 27565

reg = re.compile('((^ *)def \w+\(.*?\): *\r?\n'
                 '(?: *\r?\n)*'
                 '\\2( +)[^ ].*\r?\n'
                 '(?: *\r?\n)*'
                 '(\\2\\3.*\r?\n(?: *\r?\n)*)*)',
                 re.MULTILINE)

EDIT

import re
script = '''
def foo():
  print bar

a = 2
def foot():
  print bar

b = 10
"""
opopo =457
def foor(x):


  print bar
  print x + 10
  def g(u):
    print

  def h(rt,o):
    assert(rt==12)
a = 2
class AZERT(object):
   pass
"""


b = 10
def tabulae(x):


\tprint bar
\tprint x + 10
\tdef g(u):
\t\tprint

\tdef h(rt,o):
\t\tassert(rt==12)
a = 2


class Z:
    def inzide(x):


      print baracuda
      print x + 10
      def gululu(u):
        print

      def hortense(rt,o):
        assert(rt==12)



def oneline(x): return 2*x


def scroutchibi(h%,n():245sqfg srot b#

'''

.

reg = re.compile('((?:^[ \t]*)def \w+\(.*\): *(?=.*?[^ \t\n]).*\r?\n)'
                 '|'
                 '((^[ \t]*)def \w+\(.*\): *\r?\n'
                 '(?:[ \t]*\r?\n)*'
                 '\\3([ \t]+)[^ \t].*\r?\n'
                 '(?:[ \t]*\r?\n)*'
                 '(\\3\\4.*\r?\n(?: *\r?\n)*)*)',
                 re.MULTILINE)

regcom = re.compile('("""|\'\'\')(.+?)\\1',re.DOTALL)


avoided_spans = [ma.span(2) for ma in regcom.finditer(script)]

print 'eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee'
for ma in  reg.finditer(script):
    print ma.group(),
    print '--------------------'
    print repr(ma.group())
    print
    try:
        exec(ma.group().strip())
    except:
        print "   isn't a valid definition of a function"
    am,bm = ma.span()
    if any(a<=am<=bm<=b for a,b in avoided_spans):
        print '   is a commented definition function' 

    print 'eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee'

result

eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
def foo():
  print bar

--------------------
'def foo():\n  print bar\n\n'

eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
def foot():
  print bar

--------------------
'def foot():\n  print bar\n\n'

eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
def foor(x):


  print bar
  print x + 10
  def g(u):
    print

  def h(rt,o):
    assert(rt==12)
--------------------
'def foor(x):\n\n\n  print bar\n  print x + 10\n  def g(u):\n    print\n\n  def h(rt,o):\n    assert(rt==12)\n'

   is a commented definition function
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
def tabulae(x):


    print bar
    print x + 10
    def g(u):
        print

    def h(rt,o):
        assert(rt==12)
--------------------
'def tabulae(x):\n\n\n\tprint bar\n\tprint x + 10\n\tdef g(u):\n\t\tprint\n\n\tdef h(rt,o):\n\t\tassert(rt==12)\n'

eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
    def inzide(x):


      print baracuda
      print x + 10
      def gululu(u):
        print

      def hortense(rt,o):
        assert(rt==12)



--------------------
'    def inzide(x):\n\n\n      print baracuda\n      print x + 10\n      def gululu(u):\n        print\n\n      def hortense(rt,o):\n        assert(rt==12)\n\n\n\n'

eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
def oneline(x): return 2*x
--------------------
'def oneline(x): return 2*x\n'

eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
def scroutchibi(h%,n():245sqfg srot b#
--------------------
'def scroutchibi(h%,n():245sqfg srot b#\n'

   isn't a valid definition of a function
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee

Upvotes: 2

nemo
nemo

Reputation: 57599

No, regular expressions are not the right tool for this job. This is similar to people desperately trying to parse HTML with regular expressions. These languages are not regular. Thus you can't work around all quirks you will encounter.

Use the built-in parser module, build a parse tree, check for definition nodes and use them instead. It's even better to use the ast module as it is way more convenient to use. An example:

import ast

mdef = 'def foo(x): return 2*x'
a = ast.parse(mdef)
definitions = [n for n in ast.walk(a) if type(n) == ast.FunctionDef]

Upvotes: 9

Related Questions