Reputation: 31
I am trying to replace all .(periods) with keyword XXX which lie within an alphanumeric word in a large text.
For example: I am trying to match a.b.c.d.e ...
Expected output: I am trying to match aXXXbXXXcXXXdXXXe ...
Pattern I used: (\w+)([\.]+)(\w+)
Actual result: I am trying to match aXXXb.cXXXd.e ...
How can I get expected output via regex without using any code/stubs.
Upvotes: 2
Views: 103
Reputation: 135
If it's possible, I'd suggest a little bit different approach of using regex:
@Test
public void test_regex_replace() {
var input = "I am trying to match a.b.c.d.e ...";
var expectedOutput = "I am trying to match aXXXbXXXcXXXdXXXe ...";
var regex = Pattern.compile("((\\w+)([\\.]))+(\\w+)");
var output = regex.matcher(input).replaceAll(match -> match.group().replace(".", "XXX"));
assertEquals(expectedOutput, output);
}
Notice how I changed the pattern:
(\w+)
([\.]+)
(\w+)
((\w+)([\.]+))+
(\w+)
So it matches on words containing multiple dots. Notice, how it replaces a..b
to aXXXXXXb
instead of aXXXb
; if you want otherwise, you must modify the lambda a little bit, e.g.:
regex.matcher(input).replaceAll(match -> match.group().replaceAll("\\.+", "XXX"));
or something more performant, which replaces any number of subsequent dots to only one XXX
:
@Test
public void test_regex_replace() {
final String input = "I am trying to match a.b.c.d.e ...";
final String expectedOutput = "I am trying to match aXXXbXXXcXXXdXXXe ...";
final Pattern regex = Pattern.compile("(?:(\\w+)\\.+)+\\w+");
final String output = regex.matcher(input).replaceAll(match -> {
final String matchText = match.group();
final int matchTextLength = matchText.length();
final var sb = new StringBuilder();
int lastEnd = 0;
while (lastEnd < matchTextLength) {
int endOfWord = lastEnd;
while (endOfWord < matchTextLength && matchText.charAt(endOfWord) != '.') {
endOfWord += 1;
}
sb.append(matchText, lastEnd, endOfWord);
int endOfDots = endOfWord;
endOfDots = asd(endOfDots, matchTextLength, matchText);
if (endOfDots != endOfWord) {
sb.append("XXX");
}
lastEnd = endOfDots;
}
return sb.toString();
});
assertEquals(expectedOutput, output);
}
This avoids the problem of reusing some characters as both the left and right side of the dot by matching them together. Not sure about the performance, but it does not use any lookarounds, so I expect it to perform rather well.
You mentioned "without using any code/stubs", so this might not fit your problem, but otherwise you must use lockarounds. Other than these, the only thing I can think of is using \b
(word boundary symbol) in the regex, like so:
@Test
public void test_regex_replace() {
final String input = "I am trying to match a.b.c.d.e ...";
final String expectedOutput = "I am trying to match aXXXbXXXcXXXdXXXe ...";
final String output = input.replaceAll("\\b\\.+\\b", "XXX");
assertEquals(expectedOutput, output);
}
Upvotes: 0
Reputation: 784998
You can use lookarounds:
str = str.replaceAll("(?<=[a-zA-Z0-9])\\.(?=[a-zA-Z0-9])", "XXX");
Upvotes: 1
Reputation: 2953
Why don't you do something like if you want to change all . -
str = str.replaceAll("\\.", "XXX");
Or below if you don't want to change . if any first or last index -
str = str.replaceAll("\\.", "XXX").replaceAll("^XXX", ".").replaceAll("XXX$", ".");
Upvotes: 0