Poma
Poma

Reputation: 8474

Regex replace only inside tag (need help writing regex)

I need to write regex that replaces a with b but only inside <pre> tag.

Example

a <pre> c a <foo> a d </pre> a

Result

a <pre> c b <foo> b d </pre> a

Please help writing expression for java String.replace function. There is a guarantee that pre tag is not nested.

Upvotes: 1

Views: 1612

Answers (3)

Adrian Pronk
Adrian Pronk

Reputation: 13906

I think the best you can do with String.replace() is something like:

String string = ...
for (;;)
{
    String original = string;
    string = string.replaceFirst("(<pre>.*?)a(.*?</pre>)", "$1b$2");
    if (original.equals(string))
        break;
}

(EDIT: @Bohemian has noted the above regex doesn't work correctly. So it needs to be changed to:
(<pre>(?:(?!</pre>).)*a((?:(?!<pre>).)*</pre>) (untested) to avoid matching outside a <pre>...</pre> section. With this change, we don't need the *? quantifier and can use the more common "greedy" (*) quantifier. This is starting to look a lot like my other answer, which I only really meant as a joke!)

You're better off using a Matcher (following code off the top of my head):

import java.util.regex.Pattern;
import java.util.regex.Matcher;

Pattern pattern = Pattern.compile("(?<=<pre>)(.*?)(?=</pre>)");
Matcher m = pattern.matcher(string);
StringBuffer replacement = new StringBuffer();

while (matcher.find())
{
     matcher.appendReplacement(replacement, "");
     // Careful using unknown text in appendReplacement as any "$n" will cause problems
     replacement.append(matcher.group(1).replace("a", "b"));
}    
matcher.appendTail(replacement);
String result = replacement.toString();

Edit: Changed pattern above so that it does not match surrounding <pre> and </pre>.

Upvotes: 3

Adrian Pronk
Adrian Pronk

Reputation: 13906

Here's a regex that will do the job (I think: I wouldn't bet too much on it passing all tests enter image description here )

String replacement = original.replaceAll(
    "(?<=<pre>(?:(?!</pre>).){0,50})a(?=(?:(?!<pre>).)*</pre>)", 
    "b");

Explanation:

  • (?<=<pre>(?:(?!</pre>).){0,50}) - look-behind for a preceding <pre> so long as we don't traverse back over </pre> to find it. Java requires a finite maximum length look-behind so we use {0,50} rather than *.
  • a - The character we want to replace
  • (?=(?:.(?!<pre>))*</pre>) - Look ahead for </pre> so long as we don't traverse past <pre> to find it.

Upvotes: 0

Achintya Jha
Achintya Jha

Reputation: 12843

Pattern pattern = Pattern.compile("<pre>(.+?)</pre>");
java.util.regex.Matcher matcher = pattern.matcher("a <pre> c a <tag> a d </pre> a");

Try this:

Upvotes: -1

Related Questions