developer033
developer033

Reputation: 24894

Matches lookbehind / ahead multiple times

Code:

public static void main(String[] args) {
    String mainTag = "HI";
    String replaceTag = "667";
    String text = "92<HI=/><z==//HIb><cHIhi> ";
    System.out.println(strFormatted(mainTag, replaceTag, text));

    mainTag = "aBc";
    replaceTag = "923";
    text = "<dont replacethis>abcabc< abcabcde >";
    System.out.println(strFormatted(mainTag, replaceTag, text));
}

private static String strFormatted(String mainTag, String replaceTag, String text) {
    return text.replaceAll("(?i)(?<=<)" + mainTag + "(?=.*>)", replaceTag);
}

So, I want to replace mainTag (variable) for replaceTag (variable) only inside tags (<...>).

In the example above I want to replace the mainTag HI (case insensitive) in all occurrences inside <...> with 667, but my code only replaces the first occurrence.

Examples:

92<HI=/><z==//HIb><cHIhi> 

Expected output:

92<667=/><z==//667b><c667667> 

(mainTag = "HI", replaceTag = "667")

<dont replacethis>abcabc<abcabcde>

Expected output:

<dont replacethis>abcabc<923923de>

(mainTag = "aBc", replaceTag = "923");

Note: My code is wrong not only because he replaces only 1 time, but also because it only works if the "mainTag" succeeds the "<", in other words, the lookbehind only works for an unique situation.

Upvotes: 1

Views: 799

Answers (1)

Rohit Jain
Rohit Jain

Reputation: 213351

You just need look-ahead here. The idea is to find all the mainTags, which are followed by a >, and then matching pairs of <>, and replace with replaceTag. The following regex would work:

text.replaceAll("(?i)" + mainTag + "(?=[^<>]*>(?:[^<>]*<[^<>]*>)*[^<>]*)$", replaceTag);

Explanation:

(?i)               # Ignore Case
mainTag            # Match mainTag
(?=                # which is followed by
    [^<>]*         # Some 0 or more characters which are not < or >
    >              # Close the bracket (this ensures, mainTag is between closing bracket
    (?:            # Start a group (to match pair of bracket)
        [^<>]*     # non-bracket characters
        <          # Start a bracket 
        [^<>]*     # non-bracket characters
        >          # End the bracket
    )*             # Match the pair 0 or more times.
    [^<>]*         # Non-bracket characters 0 or more times.
)
[^<>]*)$

The above regex really assumes that brackets are always balanced. For unbalanced regex, this might give unexpected results. But then regex is not really the tool for such job.

Otherwise a regex a simple as this would also work fine:

"(?i)" + mainTag + "(?=[^<>]*>)"

that depends upon your use-case. This doesn't worry about balanced brackets. You can try the second one first, if it fits all scenario, then it's best.

Upvotes: 3

Related Questions