Sam Hasler
Sam Hasler

Reputation: 12616

regex to duplicate repeated patterns, substituting part of the pattern

I'd like to duplicate a multiple matches in a line, substituting part of the match, but keeping the runs of matches together (that seems to be the tricky part).

e.g.:

Regex:
(x(\d)(,)?)

Replacement:
X$2,O$2$3

Input:
x1,x2,Z3,x4,Z5,x6

Output: (repeated groups broken apart)
X1,O1,X2,O2,Z3,X4,O4,Z5,X6,O6

Desired output (repeated groups, "X1,X2" kept together):
X1,X2,O1,O2,Z3,X4,O4,Z5,X6,O6

Demo: https://regex101.com/r/gH9tL9/1

Is this possible with regex or do I need to use something else?


Update: Wills answer is what I expected. It occurs to me that it might be possible with multiple passes of regex.

Upvotes: 1

Views: 2095

Answers (2)

Sam Hasler
Sam Hasler

Reputation: 12616

In my specific case I'm using powershell, so I was able to come up with the following:

(linebreaks added for readability)

("x1,x2,z3,x4,z5,x6"
   -split '((?<=x\d),(?!x)|(?<!x\d),(?=x))' 
   | Foreach-Object {
      if ($_ -match 'x') {
        $_ + ',' + ($_ -replace 'x','y')
      } else {$_}
     }
) -join ''

Outputs:
x1,x2,y1,y2,z3,x4,y4,z5,x6,y6

Where:

-split '((?<=x\d),(?!x)|(?<!x\d),(?=x))'

breaks apart the string into these groups:

x1,x2
,    
z3   
,    
x4   
,    
z5   
,    
x6

using positive and negative lookahead and lookbehind:

comma with x\d before and without x after:
(?<=x\d),(?!x)

comma without x\d before and with x after:
(?<!x\d),(?=x)

Upvotes: 1

Will Barnwell
Will Barnwell

Reputation: 4089

You would have to capture the repeating patterns as one match and write out replacements for the whole repeating pattern at once. your current pattern cannot tell that your first and second matches, x1, and x2, respectively, are adjacent.

Im going to say no, this is not possible with one pure regex.

This is because of two important facts about capture groups and replacing.

  1. Repeated capture groups will return the last capture:

    Regex's are able to capture patterns which repeat an arbitrary amount of time by using the form <PATTERN>{1,},<PATTERN>+ or <PATTERN>*. However any capture group within <PATTERN> would only return the captures from the last iteration of the pattern. This would prevent your desired ability to capture matches that arbitrarily repeat.

"Hold on", you might say, "I only want to capture patterns that repeat one or two times, I could use (x(\d)(,)?)(x(\d)(,)?)?", which brings us to point 2.

  1. There is no conditional replacement

    Using the above pattern we could get your desired output for the repeated match, but not without mangling the solo match replacement. See: https://regex101.com/r/gH9tL9/2 Without the ability to turn off sections of the replacement based on the existence of capture groups, we cannot achieve the desired output.


But "No, you can't do that" is a challenge to a hacker, I hope I am shown up by a true regex ninja.


Solution with 2 regexes and some code

There's definitely ways to achieve this goal with some code.

Here's a quick and dirty python hack using two regexes http://pythonfiddle.com/wip-soln-for-so-q/

This makes use of python's re.sub(), which can pass matches to one regex to a function ordered_repl which returns the replacement string. By using your original regex within the ordered_repl we can extract the information we want and get the right order by buffering our lists of Xs and Os.

import re

input_string="x1,x2,Z3,x4,Z5,x6"

re1 = re.compile("(?:x\d,?)+") # captures the general thing you want to match using a repeating non-capturing group
re2 = re.compile("(x(\d)(,)?)") # your actual matcher

def ordered_repl(m): # m is a matchobj
    buf1 = []
    buf2 = []
    cap_iter = re.finditer(re2,m.group(0)) # returns an iterator of MatchObjects for all non-overlapping matches
    for cap_group in cap_iter:
        capture = cap_group.group(2) # capture the digit
        buf1.append("X%s" % capture) # buffer X's of this submatch group
        buf2.append("O%s" % capture) # buffer O's of this submatch group
    return "%s,%s," % (",".join(buf1),",".join(buf2)) # concatenate the buffers and return

print re.sub(re1,ordered_repl,input_string).rstrip(',') # searches string for matches to re1 and passes them to the ordered_repl function

Upvotes: 1

Related Questions