Reputation: 1480
What's an elegant way in Java 8 to remove certain specific duplicate words from a string such that:
With a list of non-duplicate words: [cat, mat]
Given a String: "A cat sat on a mat and wore a hat A cat sat on a mat and wore a hat"
The result should be: "A cat sat on a mat and wore a hat A sat on a and wore a hat"
NOTE: Its the first occurrence we want to preserve.
Upvotes: 1
Views: 1455
Reputation: 185
For a more basic solution than the others;
String input = "A cat sat on a mat and wore a hat A cat sat on a mat and wore a hat";
String[] list = {"cat", "mat"};
for(String word : list){
int index = input.indexOf(word) + word.length();
input = input.substring(0, index) + input.substring(index).replace(word, "");
}
Or by utilising the 'limit' parameter on String.split()
you could replace the loop above with the following;
for(String word : list){
String[] split = input.split(word, 2);
input = split[0] + word + split[1].replace(word, "");
}
Both of these output A cat sat on a mat and wore a hat A sat on a and wore a hat
but if you wanted to remove the double space you could easily call input.replaceAll(" {2,}", " ");
before returning the value to remove any additional spaces.
Upvotes: 3
Reputation: 33845
You could do this:
String input = "A cat sat on a mat and wore a hat A cat sat on a mat and wore a hat";
Set<String> toFilter = Set.of("cat", "mat"); // Java 9's set.of, for brievety.
Set<String> seen = new HashSet<>();
String result = Arrays.stream(input.split(" "))
.filter(s -> !toFilter.contains(s) || seen.add(s))
.collect(Collectors.joining(" "));
System.out.println(result); // A cat sat on a mat and wore a hat A sat on a and wore a hat
This takes advantage of the fact that seen.add
will return false if the word was already in the set.
As a response to some of the comments, worrying about the order of the words not being preserved:
The documentation for Arrays.stream
doesn't explicitly say the returned stream is ordered, but it does mention:
Returns a sequential Stream with the specified array as its source.
An array has a defined ordering to it, i.e. it is ordered, so I'd say it's safe to read this as that the returned stream is also ordered.
Another way you can get an ordered stream is by using Arrays.spliterator
and wrapping the result in a stream yourself (since the spliterator will report ORDERED
by documentation):
StreamSupport.stream(Arrays.spliterator(input.split(" ")), false)
But currently, Arrays.stream
does this too.
Otherwise, there is always the for-loop fall back:
String[] tokens = input.split(" ");
StringJoiner joiner = new StringJoiner(" ");
for(String s : tokens) {
if(!toFilter.contains(s) || seen.add(s)) {
joiner.add(s);
}
}
String result = joiner.toString();
Upvotes: 4
Reputation: 19315
Updated Here an example using a positive lookahead, the words removed are the firsts to occur
\b(cat|mat)\b(?=.*\b\1\b)
in java
String input = "A cat sat on a mat and wore a hat A cat sat on a mat and wore a hat";
input = input.replaceAll("\\b(cat|mat)\\b(?=.*\\b\\1\\b)", "");
System.out.println( input );
Upvotes: 3