R.G
R.G

Reputation: 7131

Python pattern to replace words between single or double quotes

I am new to Python and pretty bad with regex. My requirement is to modify a pattern in an existing code

I have extracted the code that I am trying to fix.

def replacer_factory(spelling_dict):
    def replacer(match):
        word = match.group()
        return spelling_dict.get(word, word)
    return replacer

def main():
    repkeys = {'modify': 'modifyNew', 'extract': 'extractNew'}
    with open('test.xml', 'r') as file :
        filedata = file.read()
    pattern = r'\b\w+\b' # this pattern matches whole words only
    #pattern = r'[\'"]\w+[\'"]'
    #pattern = r'["]\w+["]' 
    #pattern = '\b[\'"]\w+[\'"]\b'
    #pattern = '(["\'])(?:(?=(\\?))\2.)*?\1'

    replacer = replacer_factory(repkeys)
    filedata = re.sub(pattern, replacer, filedata)

if __name__ == '__main__':
    main()

Input

<fn:modify ele="modify">
<fn:extract name='extract' value="Title"/>
</fn:modify>

Expected Output . Please note that the replacment words can be enclosed within single or double quotes.

<fn:modify ele="modifyNew">
<fn:extract name='extractNew' value="Title"/>
</fn:modify>

The existing pattern r'\b\w+\b' results in for example <fn:modifyNew ele="modifyNew">, but what I am looking for is <fn:modify ele="modifyNew">

Patterns I attempted so far are given as comments. I realized late that couple of them are wrong as , string literals prefixed with r is for special handling of backslash etc. I am still including them to review whatever I have attempted so far.

It would be great if I can get a pattern to solve this , rather than changing the logic. If this cannot be achieved with the existing code , please point out that as well. The environment I work has Python 2.6

Any help is appreciated.

Upvotes: 1

Views: 116

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626896

You need to use r'''(['"])(\w+)\1''' regex, and then you need to adapt the replacer method:

def replacer_factory(spelling_dict):
    def replacer(match):
        return '{0}{1}{0}'.format(match.group(1), spelling_dict.get(match.group(2), match.group(2)))
    return replacer

The word you match with (['"])(\w+)\1 is either in double, or in single quotes, but the value is in Group 2, hence the use of spelling_dict.get(match.group(2), match.group(2)). Also, the quotes must be put back, hence the '{0}{1}{0}'.format().

See the Python demo:

import re
def replacer_factory(spelling_dict):
    def replacer(match):
        return '{0}{1}{0}'.format(match.group(1), spelling_dict.get(match.group(2), match.group(2)))
    return replacer

repkeys = {'modify': 'modifyNew', 'extract': 'extractNew'}
pattern = r'''(['"])(\w+)\1'''
replacer = replacer_factory(repkeys)
filedata = """<fn:modify ele="modify">
<fn:extract name='extract' value="Title"/>
</fn:modify>"""
print( re.sub(pattern, replacer, filedata) )

Output:

<fn:modify ele="modifyNew">
<fn:extract name='extractNew' value="Title"/>
</fn:modify>

Upvotes: 1

Related Questions