Michael
Michael

Reputation: 821

Regex to replace a repeating string pattern

I need to replace a repeated pattern within a word with each basic construct unit. For example I have the string "TATATATA" and I want to replace it with "TA". Also I would probably replace more than 2 repetitions to avoid replacing normal words.

I am trying to do it in Java with replaceAll method.

Upvotes: 4

Views: 9018

Answers (3)

drew moore
drew moore

Reputation: 32680

Since you asked for a regex solution:

(\\w)(\\w)(\\1\\2){2,};

(\w)(\w): matches every pair of consecutive word characters ((.)(.) will catch every consecutive pair of characters of any type), storing them in capturing groups 1 and 2. (\\1\\2) matches anytime the characters in those groups are repeated again immediately afterward, and {2,} matches when it repeats two or more times ({2,10} would match when it repeats more than one but less than ten times).

String s = "hello TATATATA world";    
Pattern p = Pattern.compile("(\\w)(\\w)(\\1\\2){2,}");
Matcher m = p.matcher(s);
while (m.find()) System.out.println(m.group());
    //prints "TATATATA"

Upvotes: 1

fge
fge

Reputation: 121760

You had better use a Pattern here than .replaceAll(). For instance:

private static final Pattern PATTERN 
    = Pattern.compile("\\b([A-Z]{2,}?)\\1+\\b");

//...

final Matcher m = PATTERN.matcher(input);
ret = m.replaceAll("$1");

edit: example:

public static void main(final String... args)
{
    System.out.println("TATATA GHRGHRGHRGHR"
        .replaceAll("\\b([A-Za-z]{2,}?)\\1+\\b", "$1"));
}

This prints:

TA GHR

Upvotes: 1

MightyPork
MightyPork

Reputation: 18881

I think you want this (works for any length of the repeated string):

String result = source.replaceAll("(.+)\\1+", "$1")

Or alternatively, to prioritize shorter matches:

String result = source.replaceAll("(.+?)\\1+", "$1")

It matches first a group of letters, and then it again (using back-reference within the match pattern itself). I tried it and it seems to do the trick.


Example

String source = "HEY HEY duuuuuuude what'''s up? Trololololo yeye .0.0.0";

System.out.println(source.replaceAll("(.+?)\\1+", "$1"));

// HEY dude what's up? Trolo ye .0

Upvotes: 9

Related Questions