Matches lookbehind / ahead multiple times

Question

Code:

public static void main(String[] args) {
    String mainTag = "HI";
    String replaceTag = "667";
    String text = "92 ";
    System.out.println(strFormatted(mainTag, replaceTag, text));

    mainTag = "aBc";
    replaceTag = "923";
    text = "abcabc< abcabcde >";
    System.out.println(strFormatted(mainTag, replaceTag, text));
}

private static String strFormatted(String mainTag, String replaceTag, String text) {
    return text.replaceAll("(?i)(?<=<)" + mainTag + "(?=.*>)", replaceTag);
}

So, I want to replace mainTag (variable) for replaceTag (variable) only inside tags (<...>).

In the example above I want to replace the mainTag HI (case insensitive) in all occurrences inside <...> with 667, but my code only replaces the first occurrence.

Examples:

Expected output:

92<667=/>

(mainTag = "HI", replaceTag = "667")

abcabc

Expected output:

abcabc<923923de>

(mainTag = "aBc", replaceTag = "923");

Note: My code is wrong not only because he replaces only 1 time, but also because it only works if the "mainTag" succeeds the "<", in other words, the lookbehind only works for an unique situation.

Rohit Jain · Accepted Answer

You just need look-ahead here. The idea is to find all the mainTags, which are followed by a >, and then matching pairs of <>, and replace with replaceTag. The following regex would work:

text.replaceAll("(?i)" + mainTag + "(?=[^<>]*>(?:[^<>]*<[^<>]*>)*[^<>]*)$", replaceTag);

Explanation:

(?i)               # Ignore Case
mainTag            # Match mainTag
(?=                # which is followed by
    [^<>]*         # Some 0 or more characters which are not < or >
    >              # Close the bracket (this ensures, mainTag is between closing bracket
    (?:            # Start a group (to match pair of bracket)
        [^<>]*     # non-bracket characters
        <          # Start a bracket 
        [^<>]*     # non-bracket characters
        >          # End the bracket
    )*             # Match the pair 0 or more times.
    [^<>]*         # Non-bracket characters 0 or more times.
)
[^<>]*)$

The above regex really assumes that brackets are always balanced. For unbalanced regex, this might give unexpected results. But then regex is not really the tool for such job.

Otherwise a regex a simple as this would also work fine:

"(?i)" + mainTag + "(?=[^<>]*>)"

that depends upon your use-case. This doesn't worry about balanced brackets. You can try the second one first, if it fits all scenario, then it's best.

Matches lookbehind / ahead multiple times

Answers (1)

Related Questions