aleroot
aleroot

Reputation: 72636

Replace named captured groups with arbitrary values in Python

I need to replace the value inside a capture group of a regular expression with some arbitrary value; I've had a look at the re.sub, but it seems to be working in a different way.

I have a string like this one :

s = 'monthday=1, month=5, year=2018'

and I have a regex matching it with captured groups like the following :

regex = re.compile('monthday=(?P<d>\d{1,2}), month=(?P<m>\d{1,2}), year=(?P<Y>20\d{2})')

now I want to replace the group named d with aaa, the group named m with bbb and group named Y with ccc, like in the following example :

'monthday=aaa, month=bbb, year=ccc'

basically I want to keep all the non matching string and substitute the matching group with some arbitrary value.

Is there a way to achieve the desired result ?

Note

This is just an example, I could have other input regexs with different structure, but same name capturing groups ...

Update

Since it seems like most of the people are focusing on the sample data, I add another sample, let's say that I have this other input data and regex :

input = '2018-12-12'
regex = '((?P<Y>20\d{2})-(?P<m>[0-1]?\d)-(?P<d>\d{2}))'

as you can see I still have the same number of capturing groups(3) and they are named the same way, but the structure is totally different... What I need though is as before replacing the capturing group with some arbitrary text :

'ccc-bbb-aaa'

replace capture group named Y with ccc, the capture group named m with bbb and the capture group named d with aaa.

In the case, regexes are not the best tool for the job, I'm open to some other proposal that achieve my goal.

Upvotes: 7

Views: 3623

Answers (5)

lains
lains

Reputation: 1

def replace_named_group_with_dict_values(pattern:str,text:str,map:dict):
    for k,v in map.items():
        if match := re.search(pattern, text):
            if k in match.groupdict():
                text = text[:match.start(k)] + str(v) + text[match.end(k):]
    return text

values = {
    'd' : 'aaa',
    'm': 'bbb',
    'Y': 'ccc',
}
s = 'monthday=1, month=5, year=2018'
p = r'monthday=(?P<d>\d{1,2}), month=(?P<m>\d{1,2}), year=(?P<Y>20\d{2})'
print(replace_named_group_with_dict_values(p,s,values))

Upvotes: 0

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

Extended Python 3.x solution on extended example (re.sub() with replacement function):

import re

d = {'d':'aaa', 'm':'bbb', 'Y':'ccc'}  # predefined dict of replace words
pat = re.compile('(monthday=)(?P<d>\d{1,2})|(month=)(?P<m>\d{1,2})|(year=)(?P<Y>20\d{2})')

def repl(m):
    pair = next(t for t in m.groupdict().items() if t[1])
    k = next(filter(None, m.groups()))  # preceding `key` for currently replaced sequence (i.e. 'monthday=' or 'month=' or 'year=')
    return k + d.get(pair[0], '')

s = 'Data: year=2018, monthday=1, month=5, some other text'
result = pat.sub(repl, s)

print(result)

The output:

Data: year=ccc, monthday=aaa, month=bbb, some other text

For Python 2.7 : change the line k = next(filter(None, m.groups())) to:

k = filter(None, m.groups())[0]

Upvotes: 2

Aran-Fey
Aran-Fey

Reputation: 43246

This is a completely backwards use of regex. The point of capture groups is to hold text you want to keep, not text you want to replace.

Since you've written your regex the wrong way, you have to do most of the substitution operation manually:

"""
Replaces the text captured by named groups.
"""
def replace_groups(pattern, string, replacements):
    pattern = re.compile(pattern)
    # create a dict of {group_index: group_name} for use later
    groupnames = {index: name for name, index in pattern.groupindex.items()}

    def repl(match):
        # we have to split the matched text into chunks we want to keep and
        # chunks we want to replace
        # captured text will be replaced. uncaptured text will be kept.
        text = match.group()
        chunks = []
        lastindex = 0
        for i in range(1, pattern.groups+1):
            groupname = groupnames.get(i)
            if groupname not in replacements:
                continue

            # keep the text between this match and the last
            chunks.append(text[lastindex:match.start(i)])
            # then instead of the captured text, insert the replacement text for this group
            chunks.append(replacements[groupname])
            lastindex = match.end(i)
        chunks.append(text[lastindex:])
        # join all the junks to obtain the final string with replacements
        return ''.join(chunks)

    # for each occurence call our custom replacement function
    return re.sub(pattern, repl, string)
>>> replace_groups(pattern, s, {'d': 'aaa', 'm': 'bbb', 'Y': 'ccc'})
'monthday=aaa, month=bbb, year=ccc'

Upvotes: 9

Ajax1234
Ajax1234

Reputation: 71461

You can use string formatting with a regex substitution:

import re
s = 'monthday=1, month=5, year=2018'
s = re.sub('(?<=\=)\d+', '{}', s).format(*['aaa', 'bbb', 'ccc'])

Output:

'monthday=aaa, month=bbb, year=ccc'

Edit: given an arbitrary input string and regex, you can use formatting like so:

input = '2018-12-12'
regex = '((?P<Y>20\d{2})-(?P<m>[0-1]?\d)-(?P<d>\d{2}))'
new_s = re.sub(regex, '{}', input).format(*["aaa", "bbb", "ccc"])

Upvotes: 2

Tobey
Tobey

Reputation: 1440

I suggest you use a loop

import re
regex = re.compile('monthday=(?P<d>\d{1,2}), month=(?P<m>\d{1,2}), year=(?P<Y>20\d{2})')
s = 'monthday=1, month=1, year=2017   \n'
s+= 'monthday=2, month=2, year=2019'


regex_as_str =  'monthday={d}, month={m}, year={Y}'
matches = [match.groupdict() for match in regex.finditer(s)]
for match in matches:
    s = s.replace(
        regex_as_str.format(**match),
        regex_as_str.format(**{'d': 'aaa', 'm': 'bbb', 'Y': 'ccc'})
    )    

You can do this multile times wiht your different regex patterns

Or you can join ("or") both patterns together

Upvotes: 0

Related Questions