Dan
Dan

Reputation: 842

Perl - replacing sequences of identical characters

I am trying to implement a regexp that, given a string, it checks for a sequence of at least 3 of identical characters and replaces it with two of that character. For example, I want to turn the below string:

sstttttrrrrrrriing

into

ssttrriing 

I am thinking of something along the lines of...

$string =~ s/(\D{3,})/substr($1, 0, 2)/e;

But this will not work because:

  1. It doesn't check if the three alphabetical characters are identical; it can match a sequence of three or more distinct characters.
  2. It only replaces the first match; I need to accommodate for all matches in this regexp.

Can anyone help me?

Upvotes: 4

Views: 481

Answers (2)

Sparky
Sparky

Reputation: 8477

$ echo "sssssttttttrrrrriiiinnnnggg" | perl -pe "s/(.)\1+/\1\1/g"
ssttrriinngg

Upvotes: 3

TLP
TLP

Reputation: 67908

You can use a capture group and backreference it with \1, then insert it twice afterwards.

$ perl -plwe 's/(.)\1{2,}/$1$1/g'
sstttttrrrrrrriing
ssttrriing

Or you can use the \K (keep) escape sequence to avoid having to re-insert.

s/(.)\1\K\1+//g

Replace wildcard . for any suitable character (class) if needed. For example for letters:

perl -plwe 's/(\pL)\1\K\1+//g'

Upvotes: 12

Related Questions