Checksum
Checksum

Reputation: 3240

Java Regex check previous char before splitting

I have a string like this

This:string:must~:be:split:when:previous:char:is:not~:this

I need to split the line with the delimiter ":" but only if the character before the delimiter is NOT "~"

I have the following regex now:

String[] split = str.split(":(?<!~:)");

It works, but since I arrived at it purely by trial and error, I'm not convinced that its the most efficient way of doing it. Also, this function will be repeatedly called on large strings frequently, so performance does come into consideration. What is a more efficient way of doing it?

Upvotes: 4

Views: 2157

Answers (3)

Andrew T Finnell
Andrew T Finnell

Reputation: 13628

Update: To make this more fair I wanted to use a compiled Pattern and see the results of that. So I updated the code to use compiled pattern, non-compiled pattern and my custom method.

While this isn't using regex it proves to be faster then the regex given.

public static void main(String[] args) {
    Pattern pattern = Pattern.compile(":(?<!~:)");
    for (int runs = 0; runs < 4; ++runs) {
        long start = System.currentTimeMillis();
        for (int index = 0; index < 100000; ++index) {
            "This:string:must~:be:split:when:previous:char:is:not~:this".split(":(?<!~:)");
        }
        long stop = System.currentTimeMillis();
        System.out.println("Run: " + runs + " Regex: " + (stop - start));

        start = System.currentTimeMillis();
        for (int index = 0; index < 100000; ++index) {
            pattern.split("This:string:must~:be:split:when:previous:char:is:not~:this");
        }
        stop = System.currentTimeMillis();
        System.out.println("Run: " + runs + " Compiled Regex: " + (stop - start));

        start = System.currentTimeMillis();
        for (int index = 0; index < 100000; ++index) {
            specialSplit("This:string:must~:be:split:when:previous:char:is:not~:this");
        }
        stop = System.currentTimeMillis();
        System.out.println("Run: " + runs + " Custom: " + (stop - start));
    }

    for (String s : specialSplit("This:string:must~:be:split:when:previous:char:is:not~:this")) {
        System.out.println(s);
    }
}

public static String[] specialSplit(String text) {
    List<String> stringsAfterSplit = new ArrayList<String>();

    StringBuilder splitString = new StringBuilder();
    char previousChar = 0;
    for (int index = 0; index < text.length(); ++index) {
        char charAtIndex = text.charAt(index);
        if (charAtIndex == ':' && previousChar != '~') {
             stringsAfterSplit.add(splitString.toString());
             splitString.delete(0, splitString.length());
        } else {
                splitString.append(charAtIndex);
        }
            previousChar = charAtIndex;
    }
    if (splitString.length() > 0) {
        stringsAfterSplit.add(splitString.toString());
    }
    return stringsAfterSplit.toArray(new String[stringsAfterSplit.size()]);
}

Output

Run: 0 Regex: 468
Run: 0 Compiled Regex: 365
Run: 0 Custom: 169
Run: 1 Regex: 437
Run: 1 Compiled Regex: 363
Run: 1 Custom: 166
Run: 2 Regex: 445
Run: 2 Compiled Regex: 363
Run: 2 Custom: 167
Run: 3 Regex: 436
Run: 3 Compiled Regex: 361
Run: 3 Custom: 167
This
string
must~:be
split
when
previous
char
is
not~:this

Upvotes: 2

Kobi
Kobi

Reputation: 138037

A slightly simpler approach is this:

(?<!~):

That way you don't match : twice. I doubt you'll see any difference in performances though. It is also very simple to write without a regular expression by simply looking for the next colon, and checking for tilde before it.

Upvotes: 5

Chandu
Chandu

Reputation: 82923

Try this one. [^~]:

Tested in JS

Upvotes: 0

Related Questions