Reputation: 3240
I have a string like this
This:string:must~:be:split:when:previous:char:is:not~:this
I need to split the line with the delimiter ":" but only if the character before the delimiter is NOT "~"
I have the following regex now:
String[] split = str.split(":(?<!~:)");
It works, but since I arrived at it purely by trial and error, I'm not convinced that its the most efficient way of doing it. Also, this function will be repeatedly called on large strings frequently, so performance does come into consideration. What is a more efficient way of doing it?
Upvotes: 4
Views: 2157
Reputation: 13628
Update: To make this more fair I wanted to use a compiled Pattern and see the results of that. So I updated the code to use compiled pattern, non-compiled pattern and my custom method.
While this isn't using regex it proves to be faster then the regex given.
public static void main(String[] args) {
Pattern pattern = Pattern.compile(":(?<!~:)");
for (int runs = 0; runs < 4; ++runs) {
long start = System.currentTimeMillis();
for (int index = 0; index < 100000; ++index) {
"This:string:must~:be:split:when:previous:char:is:not~:this".split(":(?<!~:)");
}
long stop = System.currentTimeMillis();
System.out.println("Run: " + runs + " Regex: " + (stop - start));
start = System.currentTimeMillis();
for (int index = 0; index < 100000; ++index) {
pattern.split("This:string:must~:be:split:when:previous:char:is:not~:this");
}
stop = System.currentTimeMillis();
System.out.println("Run: " + runs + " Compiled Regex: " + (stop - start));
start = System.currentTimeMillis();
for (int index = 0; index < 100000; ++index) {
specialSplit("This:string:must~:be:split:when:previous:char:is:not~:this");
}
stop = System.currentTimeMillis();
System.out.println("Run: " + runs + " Custom: " + (stop - start));
}
for (String s : specialSplit("This:string:must~:be:split:when:previous:char:is:not~:this")) {
System.out.println(s);
}
}
public static String[] specialSplit(String text) {
List<String> stringsAfterSplit = new ArrayList<String>();
StringBuilder splitString = new StringBuilder();
char previousChar = 0;
for (int index = 0; index < text.length(); ++index) {
char charAtIndex = text.charAt(index);
if (charAtIndex == ':' && previousChar != '~') {
stringsAfterSplit.add(splitString.toString());
splitString.delete(0, splitString.length());
} else {
splitString.append(charAtIndex);
}
previousChar = charAtIndex;
}
if (splitString.length() > 0) {
stringsAfterSplit.add(splitString.toString());
}
return stringsAfterSplit.toArray(new String[stringsAfterSplit.size()]);
}
Output
Run: 0 Regex: 468
Run: 0 Compiled Regex: 365
Run: 0 Custom: 169
Run: 1 Regex: 437
Run: 1 Compiled Regex: 363
Run: 1 Custom: 166
Run: 2 Regex: 445
Run: 2 Compiled Regex: 363
Run: 2 Custom: 167
Run: 3 Regex: 436
Run: 3 Compiled Regex: 361
Run: 3 Custom: 167
This
string
must~:be
split
when
previous
char
is
not~:this
Upvotes: 2
Reputation: 138037
A slightly simpler approach is this:
(?<!~):
That way you don't match :
twice. I doubt you'll see any difference in performances though. It is also very simple to write without a regular expression by simply looking for the next colon, and checking for tilde before it.
Upvotes: 5