Brandon Lorenz
Brandon Lorenz

Reputation: 221

How to replace the Nth appearance of a needle in a haystack? (Python)

I am trying to replace the Nth appearance of a needle in a haystack. I want to do this simply via re.sub(), but cannot seem to come up with an appropriate regex to solve this. I am trying to adapt: http://docstore.mik.ua/orelly/perl/cookbook/ch06_06.htm but am failing at spanning multilines, I suppose.

My current method is an iterative approach that finds the position of each occurrence from the beginning after each mutation. This is pretty inefficient and I would like to get some input. Thanks!

Upvotes: 2

Views: 1549

Answers (6)

woot
woot

Reputation: 7606

I have a similar function I wrote to do this. I was trying to replicate SQL REGEXP_REPLACE() functionality. I ended up with:

def sql_regexp_replace( txt, pattern, replacement='', position=1, occurrence=0, regexp_modifier='c'):
    class ReplWrapper(object):
        def __init__(self, replacement, occurrence):
            self.count = 0
            self.replacement = replacement
            self.occurrence = occurrence
        def repl(self, match):
            self.count += 1
            if self.occurrence == 0 or self.occurrence == self.count:
                return match.expand(self.replacement)
            else: 
                try:
                    return match.group(0)
                except IndexError:
                    return match.group(0)
    occurrence = 0 if occurrence < 0 else occurrence
    flags = regexp_flags(regexp_modifier)
    rx = re.compile(pattern, flags)
    replw = ReplWrapper(replacement, occurrence)
    return txt[0:position-1] + rx.sub(replw.repl, txt[position-1:])

One important note that I haven't seen mentioned is that you need to return match.expand() otherwise it won't expand the \1 templates properly and will treat them as literals.

If you want this to work you'll need to handle the flags differently (or take it from my github, it's simple to implement and you can dummy it for a test by setting it to 0 and ignoring my call to regexp_flags()).

Upvotes: 0

Ted Striker
Ted Striker

Reputation: 81

If the pattern ("needle") or replacement is a complex regular expression, you can't assume anything. The function "nth_occurrence_sub" is what I came up with as a more general solution:

def nth_match_end(pattern, string, n, flags):
    for i, match_object in enumerate(re.finditer(pattern, string, flags)):
        if i + 1 == n:
            return match_object.end()


def nth_occurrence_sub(pattern, repl, string, n=0, flags=0):
    max_n = len(re.findall(pattern, string, flags))
    if abs(n) > max_n or n == 0:
        return string
    if n < 0:
        n = max_n + n + 1
    sub_n_times = re.sub(pattern, repl, string, n, flags)
    if n == 1:
        return sub_n_times
    nm1_end = nth_match_end(pattern, string, n - 1, flags)
    sub_nm1_times = re.sub(pattern, repl, string, n - 1, flags)
    sub_nm1_change = sub_nm1_times[:-1 * len(string[nm1_end:])]
    components = [
        string[:nm1_end],
        sub_n_times[len(sub_nm1_change):]
        ]
    return ''.join(components)

Upvotes: 0

Gabi Purcaru
Gabi Purcaru

Reputation: 31524

I've been struggling for a while with this, but I found a solution that I think is pretty pythonic:

>>> def nth_matcher(n, replacement):
...     def alternate(n):
...         i=0
...         while True:
...             i += 1
...             yield i%n == 0
...     gen = alternate(n)
...     def match(m):
...         replace = gen.next()
...         if replace:
...             return replacement
...         else:
...             return m.group(0)
...     return match
...     
... 
>>> re.sub("([0-9])", nth_matcher(3, "X"), "1234567890")
'12X45X78X0'

EDIT: the matcher consists of two parts:

  1. the alternate(n) function. This returns a generator that returns an infinite sequence True/False, where every nth value is True. Think of it like list(alternate(3)) == [False, False, True, False, False, True, False, ...].

  2. The match(m) function. This is the function that gets passed to re.sub: it gets the next value in alternate(n) (gen.next()) and if it's True it replaces the matched value; otherwise, it keeps it unchanged (replaces it with itself).

I hope this is clear enough. If my explanation is hazy, please say so and I'll improve it.

Upvotes: 1

Felix Kling
Felix Kling

Reputation: 816334

I think you mean re.sub. You could pass a function and keep track of how often it was called so far:

def replaceNthWith(n, replacement):
    def replace(match, c=[0]):
        c[0] += 1
        return replacement if c[0] == n else match.group(0)
    return replace

Usage:

re.sub(pattern, replaceNthWith(n, replacement), str)

But this approach feels a bit hacky, maybe there are more elegant ways.

DEMO

Upvotes: 3

Jacob Eggers
Jacob Eggers

Reputation: 9322

Something like this regex should help you. Though I'm not sure how efficient it is:

#N=3   
re.sub(
  r'^((?:.*?mytexttoreplace){2}.*?)mytexttoreplace',
  '\1yourreplacementtext.', 
  'mystring',
  flags=re.DOTALL
)

The DOTALL flag is important.

Upvotes: 2

Matt Warren
Matt Warren

Reputation: 686

Could you do it using re.findall with MatchObject.start() and MatchObject.end()?

find all occurences of pattern in string with .findall, get indices of Nth occurrence with .start/.end, make new string with replacement value using the indices?

Upvotes: 0

Related Questions