Andy Cribbens
Andy Cribbens

Reputation: 1480

In Java 8, whats an elegant way to remove certain duplicate words from a string

What's an elegant way in Java 8 to remove certain specific duplicate words from a string such that:

With a list of non-duplicate words: [cat, mat]

Given a String: "A cat sat on a mat and wore a hat A cat sat on a mat and wore a hat"

The result should be: "A cat sat on a mat and wore a hat A sat on a and wore a hat"

NOTE: Its the first occurrence we want to preserve.

Upvotes: 1

Views: 1455

Answers (3)

Lucy Stevens
Lucy Stevens

Reputation: 185

For a more basic solution than the others;

String input = "A cat sat on a mat and wore a hat A cat sat on a mat and wore a hat";
String[] list = {"cat", "mat"};
    for(String word : list){
        int index = input.indexOf(word) + word.length();
        input = input.substring(0, index) + input.substring(index).replace(word, "");
    }

Or by utilising the 'limit' parameter on String.split() you could replace the loop above with the following;

    for(String word : list){
        String[] split = input.split(word, 2);
        input = split[0] + word + split[1].replace(word, "");
    }

Both of these output A cat sat on a mat and wore a hat A sat on a and wore a hat but if you wanted to remove the double space you could easily call input.replaceAll(" {2,}", " "); before returning the value to remove any additional spaces.

Upvotes: 3

Jorn Vernee
Jorn Vernee

Reputation: 33845

You could do this:

String input = "A cat sat on a mat and wore a hat A cat sat on a mat and wore a hat";

Set<String> toFilter = Set.of("cat", "mat"); // Java 9's set.of, for brievety.
Set<String> seen = new HashSet<>();

String result = Arrays.stream(input.split(" "))
        .filter(s -> !toFilter.contains(s) || seen.add(s))
        .collect(Collectors.joining(" "));

System.out.println(result); // A cat sat on a mat and wore a hat A sat on a and wore a hat

This takes advantage of the fact that seen.add will return false if the word was already in the set.


As a response to some of the comments, worrying about the order of the words not being preserved:

The documentation for Arrays.stream doesn't explicitly say the returned stream is ordered, but it does mention:

Returns a sequential Stream with the specified array as its source.

An array has a defined ordering to it, i.e. it is ordered, so I'd say it's safe to read this as that the returned stream is also ordered.

Another way you can get an ordered stream is by using Arrays.spliterator and wrapping the result in a stream yourself (since the spliterator will report ORDERED by documentation):

StreamSupport.stream(Arrays.spliterator(input.split(" ")), false)

But currently, Arrays.stream does this too.


Otherwise, there is always the for-loop fall back:

String[] tokens = input.split(" ");
StringJoiner joiner = new StringJoiner(" ");
for(String s : tokens) {
    if(!toFilter.contains(s) || seen.add(s)) {
        joiner.add(s);
    }
}

String result = joiner.toString();

Upvotes: 4

Nahuel Fouilleul
Nahuel Fouilleul

Reputation: 19315

Updated Here an example using a positive lookahead, the words removed are the firsts to occur

\b(cat|mat)\b(?=.*\b\1\b)

in java

String input = "A cat sat on a mat and wore a hat A cat sat on a mat and wore a hat";
input = input.replaceAll("\\b(cat|mat)\\b(?=.*\\b\\1\\b)", "");
System.out.println( input );

Upvotes: 3

Related Questions