Reputation: 3021
I am using Python 2.6.4.
I have a series of select statements in a text file and I need to extract the field names from each select query. This would be easy if some of the fields didn't use nested functions like to_char() etc.
Given select statement fields that could have several nested parenthese like "ltrim(rtrim(to_char(base_field_name, format))) renamed_field_name," or the simple case of just "base_field_name" as a field, is it possible to use Python's re module to write a regex to extract base_field_name? If so, what would the regex look like?
Upvotes: 3
Views: 2201
Reputation:
Here's a really hacky parser that does what you want.
It works by calling 'eval' on the text to be parsed, mapping all identifiers to a function which returns its first argument (which I'm guessing is what you want given your example).
class FakeFunction(object):
def __init__(self, name):
self.name = name
def __call__(self, *args):
return args[0]
def __str__(self):
return self.name
class FakeGlobals(dict):
def __getitem__(self, x):
return FakeFunction(x)
def ExtractBaseFieldName(x):
return eval(x, FakeGlobals())
print ExtractBaseFieldName('ltrim(rtrim(to_char(base_field_name, format)))')
Upvotes: 1
Reputation: 9172
Do you really need regular expressions? To get the one you've got up there I'd use
s[s.rfind('(')+1:s.find(')')].split(',')[0]
with 's' containing the original string.
Of course, it's not a general solution, but...
Upvotes: 0
Reputation: 6326
This may be good enough:
import re
print re.match(r".*\(([^\)]+)\)", "ltrim(to_char(field_name, format)))").group(1)
You would need to do further processing. For example pick up the function name as well and pull the field name according to function signature.
.*(\w+)\(([^\)]+)\)
Upvotes: 1
Reputation: 16625
>>> import re
>>> string = 'ltrim(rtrim(to_char(base_field_name, format))) renamed_field_name'
>>> rx = re.compile('^(.*?\()*(.+?)(,.*?)*(,|\).*?)*$')
>>> rx.search(string).group(2)
'base_field_name'
>>> rx.search('base_field_name').group(2)
'base_field_name'
Upvotes: 2
Reputation: 19257
Either a table-driven parser as Alex Martelli suggests or a hand-written recursive descent parser. They're not hard and quite rewarding to write.
Upvotes: 2
Reputation: 882681
Regular expressions are not suitable for parsing "nested" structures. Try, instead, a full-fledged parsing kit such as pyparsing -- examples of using pyparsing specifically to parse SQL can be found here and here, for example (you'll no doubt need to take the examples just as a starting point, and write some parsing code of your own, but, it's definitely not too difficult).
Upvotes: 11