TheObserver
TheObserver

Reputation: 3021

recursive nested expression in Python

I am using Python 2.6.4.

I have a series of select statements in a text file and I need to extract the field names from each select query. This would be easy if some of the fields didn't use nested functions like to_char() etc.

Given select statement fields that could have several nested parenthese like "ltrim(rtrim(to_char(base_field_name, format))) renamed_field_name," or the simple case of just "base_field_name" as a field, is it possible to use Python's re module to write a regex to extract base_field_name? If so, what would the regex look like?

Upvotes: 3

Views: 2201

Answers (6)

user97370
user97370

Reputation:

Here's a really hacky parser that does what you want.

It works by calling 'eval' on the text to be parsed, mapping all identifiers to a function which returns its first argument (which I'm guessing is what you want given your example).

class FakeFunction(object):
    def __init__(self, name):
        self.name = name
    def __call__(self, *args):
        return args[0]
    def __str__(self):
        return self.name

class FakeGlobals(dict):
    def __getitem__(self, x):
        return FakeFunction(x)

def ExtractBaseFieldName(x):
    return eval(x, FakeGlobals())

print ExtractBaseFieldName('ltrim(rtrim(to_char(base_field_name, format)))')

Upvotes: 1

Ricardo Cárdenes
Ricardo Cárdenes

Reputation: 9172

Do you really need regular expressions? To get the one you've got up there I'd use

  s[s.rfind('(')+1:s.find(')')].split(',')[0]

with 's' containing the original string.

Of course, it's not a general solution, but...

Upvotes: 0

mtmk
mtmk

Reputation: 6326

This may be good enough:

import re
print re.match(r".*\(([^\)]+)\)", "ltrim(to_char(field_name, format)))").group(1)

You would need to do further processing. For example pick up the function name as well and pull the field name according to function signature.

.*(\w+)\(([^\)]+)\)

Upvotes: 1

Attila O.
Attila O.

Reputation: 16625

>>> import re
>>> string = 'ltrim(rtrim(to_char(base_field_name, format))) renamed_field_name'
>>> rx = re.compile('^(.*?\()*(.+?)(,.*?)*(,|\).*?)*$')
>>> rx.search(string).group(2)
'base_field_name'
>>> rx.search('base_field_name').group(2)
'base_field_name'

Upvotes: 2

just somebody
just somebody

Reputation: 19257

Either a table-driven parser as Alex Martelli suggests or a hand-written recursive descent parser. They're not hard and quite rewarding to write.

Upvotes: 2

Alex Martelli
Alex Martelli

Reputation: 882681

Regular expressions are not suitable for parsing "nested" structures. Try, instead, a full-fledged parsing kit such as pyparsing -- examples of using pyparsing specifically to parse SQL can be found here and here, for example (you'll no doubt need to take the examples just as a starting point, and write some parsing code of your own, but, it's definitely not too difficult).

Upvotes: 11

Related Questions