Danielson
Danielson

Reputation: 2696

Regex replace all matches but not the first and last character

I'm afraid I looked over something obvious. But I want to match and replace words. But only if there is a non-letter character is both preceding on tailing. Like match kaas:

<p>Kaas bla bla
bla more kaas, bla 
another line adding more kaas to....

This regex works \P{L}kaas\P{L} (kaas is a variable). But when I replace kaas with cheese, I get:

<pcheesebla bla
bla morecheese bla 
another line adding morecheeseto....

Now I can do:

final String nonChar = "\\P{L}";
final String dutchWord = "kaas";
final String englishWord = "cheese";
final String text = getText();
final Pattern p = Pattern.compile(nonChar + dutchWord + nonChar);
final Matcher match = p.matcher(text);
while (match.find()) {
    final int start = match.start();
    final int end = match.end();
    final String c1 = Character.toString(text.charAt(start));
    final String c2 = Character.toString(text.charAt(end - 1));
    final String result = match.replaceFirst(c1 + englishWord + c2);
    //final String result = match.replaceAll(c1 + englishWord + c2);// not a `c1` and `c2` are equal
}

Which, works only once, because I can't get the right information out of Matcher to figure out the preceding and tailing character of kaas. I'm pretty sure I saw something on looking forward and backward regex characters - I think. I tried using ?: but I keep getting PatternSyntaxExceptions.

What do I need to add to fix this? And how in Java. Does it make a difference that I'm using P{L} instead of *w type of characters for this?

Note: the reason I use P is this should also work for non-Western languages, which I need.

Upvotes: 1

Views: 1294

Answers (1)

anubhava
anubhava

Reputation: 785246

You can use lookarounds for zero-width assertion here:

(?<!\p{L})kaas(?!\p{L})

This will only assert that kaas is not preceded or followed by another unicode letter.

In java it will be:

final Pattern p = Pattern.compile("(?<!\\p{L})" + Pattern.quote(dutchWord) + "(?!\\p{L})", 
                   Pattern.CASE_INSENSITIVE); 

PS: It is safer to use Pattern.quote for a user provided input.

Upvotes: 1

Related Questions