Frank Epps
Frank Epps

Reputation: 580

Calling a function on captured group in re.sub()

>>> base64_encode = lambda url : url.encode('base64').replace('\n', '')
>>> s = '<A HREF="http://www.google.com" ID="test">blah</A>'
>>> re.sub(r'(?<=href=")([\w:/.]+)(?=")', base64_encode(r'\1'), s, flags=re.I)
<A HREF="XDE=" ID="test">blah</A>

The base64 encoding of the string http://www.google.com is aHR0cDovL3d3dy5nb29nbGUuY29t not XDE=, which is the encoding of \1.

How do I pass the captured group into the function?

Upvotes: 6

Views: 3402

Answers (2)

rmunn
rmunn

Reputation: 36708

Write your function to take a single parameter, which will be a match object (see http://docs.python.org/2.7/library/re.html#match-objects for details on these). Inside your function, use m.group(1) to get the first group from your match object m.

And when you pass the function to re.sub, don't use parentheses:

re.sub("some regex", my_match_function, s, flags=re.I)

Upvotes: 4

mgilson
mgilson

Reputation: 310069

You pass a function to re.sub and then you pull the group from there:

def base64_encode(match):
    """
    This function takes a re 'match object' and performs
    The appropriate substitutions
    """

    group = match.group(1)
    ... #Code to encode as base 64
    return result

re.sub(...,base64_encode,s,flags=re.I)

Upvotes: 12

Related Questions