Reputation: 53
I'm trying to build a regex to "reduce" duplicate consecutive substrings from a string in Java. For example, for the following input:
The big black dog big black dog is a friendly friendly dog who lives nearby nearby.
I'd like to get the following output:
The big black dog is a friendly dog who lives nearby.
This is the code I have so far:
String input = "The big black dog big black dog is a friendly friendly dog who lives nearby nearby.";
Pattern dupPattern = Pattern.compile("((\\b\\w+\\b\\s)+)\\1+", Pattern.CASE_INSENSITIVE);
Matcher matcher = dupPattern.matcher(input);
while (matcher.find()) {
input = input.replace(matcher.group(), matcher.group(1));
}
Which is working out fine for all duplicate substrings except for the end of the sentence:
The big black dog is a friendly dog who lives nearby nearby.
I understand that my regex requires a whitespace after each word in the substring, meaning it won't catch cases with a period instead of a space. I can't seem to find a workaround for this, I have tried playing around with the capture groups and also changing the regex to look for a whitespace or a period instead of just a whitespace, but this solution will only work if there is a period after each duplicate part of the substring ("nearby.nearby.").
Can somebody point me in the right direction? Ideally the inputs for this method will be short paragraphs and not just one-liners.
Upvotes: 5
Views: 2737
Reputation: 11075
Combine both @Thomas Ayoub's answer and @Matt's comment.
public class Test2 {
public static void main(String args[]){
String input = "The big big black dog big black dog is a friendly friendly dog who lives nearby nearby.";
String result = input.replaceAll("\\b([ \\w]+)\\1", "$1");
while(!input.equals(result)){
input = result;
result = input.replaceAll("\\b([ \\w]+)\\1", "$1");
}
System.out.println(result);
}
}
Upvotes: 2
Reputation: 29431
You can use
input.replaceAll("([ \\w]+)\\1", "$1");
See live demo:
import java.io.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Ideone
{
public static void main (String[] args) throws java.lang.Exception
{
String input = "The big black dog big black dog is a friendly friendly dog who lives nearby nearby.";
Pattern dupPattern = Pattern.compile("([ \\w]+)\\1", Pattern.CASE_INSENSITIVE);
Matcher matcher = dupPattern.matcher(input);
while (matcher.find()) {
input = input.replaceAll("([ \\w]+)\\1", "$1");
}
System.out.println(input);
}
}
Upvotes: 3