Reputation: 2696
I'm afraid I looked over something obvious. But I want to match and replace words. But only if there is a non-letter character is both preceding on tailing. Like match kaas
:
<p>Kaas bla bla
bla more kaas, bla
another line adding more kaas to....
This regex works \P{L}kaas\P{L} (kaas is a variable). But when I replace kaas
with cheese
, I get:
<pcheesebla bla
bla morecheese bla
another line adding morecheeseto....
Now I can do:
final String nonChar = "\\P{L}";
final String dutchWord = "kaas";
final String englishWord = "cheese";
final String text = getText();
final Pattern p = Pattern.compile(nonChar + dutchWord + nonChar);
final Matcher match = p.matcher(text);
while (match.find()) {
final int start = match.start();
final int end = match.end();
final String c1 = Character.toString(text.charAt(start));
final String c2 = Character.toString(text.charAt(end - 1));
final String result = match.replaceFirst(c1 + englishWord + c2);
//final String result = match.replaceAll(c1 + englishWord + c2);// not a `c1` and `c2` are equal
}
Which, works only once, because I can't get the right information out of Matcher
to figure out the preceding and tailing character of kaas
. I'm pretty sure I saw something on looking forward and backward regex characters - I think. I tried using ?:
but I keep getting PatternSyntaxException
s.
What do I need to add to fix this? And how in Java. Does it make a difference that I'm using P{L}
instead of *w
type of characters for this?
Note: the reason I use P
is this should also work for non-Western languages, which I need.
Upvotes: 1
Views: 1294
Reputation: 785246
You can use lookarounds for zero-width assertion here:
(?<!\p{L})kaas(?!\p{L})
This will only assert that kaas
is not preceded or followed by another unicode letter.
In java it will be:
final Pattern p = Pattern.compile("(?<!\\p{L})" + Pattern.quote(dutchWord) + "(?!\\p{L})",
Pattern.CASE_INSENSITIVE);
PS: It is safer to use Pattern.quote
for a user provided input.
Upvotes: 1