Reputation: 23
I am struggling with a little problem concerning regular expressions.
I want to replace all odd length substrings of a specific character with another substring of the same length but with a different character. All even sequences of the specified character should remain the same.
Simplified example: A string contains the letters a,b and y and all the odd length sequences of y's should be replaced by z's:
abyyyab -> abzzzab
Another possible example might be:
ycyayybybcyyyyycyybyyyyyyy
becomes
zczayybzbczzzzzcyybzzzzzzz
I have no problem matching all the sequences of odd length using a regular expression.
Unfortunately I have no idea how to incorporate the length information from these matches into the replacement string. I know I have to use backreferences/capture groups somehow, but even after reading lots of documentation and Stack Overflow articles I still don't know how to pursue the issue correctly.
Concerning possible regex engines, I am working with mainly with Emacs or Vim.
In case I have overlooked an easier general solution without a complicated regular expression (e.g. a small and fixed series of simple search and replace commands), this would help too.
Upvotes: 1
Views: 192
Reputation: 7689
Here's how I'd do it in vim:
:s/\vy@<!y(yy)*y@!/\=repeat('z', len(submatch(0)))/g
Explanation:
The regex we're using is \vy@<!y(yy)*y@!
. The \v
at the beginning turns on the magic
option, so we don't have to escape as much. Without it, we would have y\@<!y\(yy\)*y\@!
.
The basic idea for this search, is that we're looking for a 'y' y
followed by a run of pairs of 'y's,(yy)*
. Then we add y@<!
to guarantee there isn't a 'y' before our match, and add y\@!
to guarantee there isn't a 'y' after our match.
Then we replace this using the eval
register, i.e. \=
. From :h sub-replace-\=
:
*sub-replace-\=* *s/\=*
When the substitute string starts with "\=" the remainder is interpreted as an
expression.
The special meaning for characters as mentioned at |sub-replace-special| does
not apply except for "<CR>". A <NL> character is used as a line break, you
can get one with a double-quote string: "\n". Prepend a backslash to get a
real <NL> character (which will be a NUL in the file).
The "\=" notation can also be used inside the third argument {sub} of
|substitute()| function. In this case, the special meaning for characters as
mentioned at |sub-replace-special| does not apply at all. Especially, <CR> and
<NL> are interpreted not as a line break but as a carriage-return and a
new-line respectively.
When the result is a |List| then the items are joined with separating line
breaks. Thus each item becomes a line, except that they can contain line
breaks themselves.
The whole matched text can be accessed with "submatch(0)". The text matched
with the first pair of () with "submatch(1)". Likewise for further
sub-matches in ().
TL;DR, :s/foo/\=blah
replaces foo with blah
evaluated as vimscript code. So the code we're evaluating is repeat('z', len(submatch(0)))
which simply makes on 'z' for each 'y' we've matched.
Upvotes: 5