Wilhelm Sorban
Wilhelm Sorban

Reputation: 1131

Find and replace variable number of items in string using regex

I have several strings that looks like this:

sum({foo, c[0663, 0667, 0673, 0677, 0693, 0697, 0703, 0707]})
sum({foo, c[0663, 0667, 0673, 0677, 0693]})
sum({foo, c[0697, 0703, 0707]})
sum({foo, c[0693, 0697, 0703, 0707]})

I can find all of them, using this regex:

sum\(\{foo, c\[(?:(\d{4})(, )?)+\]\}\)

The problem appears when I need to replace the lines, which contain a variable occurrences of 4 digits, separated by a comma and a space.

So the output of the first line should look like this:

[1234] 0663 + [1234] 0667 + [1234] 0673 + [1234] 0677 + [1234] 0693 + [1234] 0697 + [1234] 0703 + [1234] 0707

Of the second line:

[1234] 0663 + [1234] 0667 + [1234] 0673 + [1234] 0677 + [1234] 0693

And so on.

So basically, all occurrences of the four digit characters, must be replaced by:

[1234] xxxx

("[1234]" is a constant string) (x represents a digit)

and the

, 

(comma space)

must be replaced by

+

(plus sign)

Therefore, it must not have the + sign neither at the beginning nor the end of the line, hence why, the four digits are treated separately from the ", " (comma space).

Normally, I would do the replacement pattern like:

[1234] \1

But this will put [1234] and the last saved pattern, so in the case of the first line, it would be replaced into:

[1234] 0707

Also, the ", " (comma space) part must always be replaced into plus sign.

Upvotes: 3

Views: 1139

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626794

Use a \G based regex with a conditional replacement pattern:

Find What: (?:\G,\h*|^sum\(\{foo,\h*c\[)(\d{4})(\]\}\))?
Replace With: (?{2}[1234] $1:[1234] $1 + )

Note: If the ]}) must appear at the end of the line, add $ - (\]\}\)$)?

enter image description here

Details:

  • (?:\G,\h*|^sum\(\{foo,\h*c\[) - either sum({foo, c[ like pattern at the start of a string/line (see ^sum\(\{foo,\h*c\[) or the end of the preceding successful match with a , and 0+ horizontal whitespaces (see \G,\h*)
  • (\d{4}) - Group 4: exactly four digits
  • (\]\}\))? - an optional Group 2: a sequence of ]}), one or zero times

The replacement pattern:

  • (?{2} - (conditional replacement pattern start) If Group 2 matched:
    • [1234] $1 - literal [1234] substring and the Group 1 value
    • : - else
    • [1234] $1 + - literal [1234] substring, the Group 1 value and a + literal char sequence
  • ) - end of the conditional replacement.

Upvotes: 3

Related Questions