haventchecked
haventchecked

Reputation: 2016

In Groovy, String.split(regex, int) does not produce the expected results

I'm trying to split a String and retain n + 1 items where there are n delimiters. There are many solutions here on SO that suggest using .split(regex, -1) to retrieve all tokens. This is not working, however, when trying it in Groovy.

println ",,,,,,".split(",", -1).length

prints 0

Any idea what I can do about this to get consistent behavior with the Java method? Calling .toString() on it makes no difference (Converting GString to java.lang.String)

edit: I also had String.mixin StringUtils in my script. There is no conflicting method signature since StringUtils does not have a .split(regex, int) method defined. Am I using mixin incorrectly? Is there any way to have this play nicely together?

edit2: enter image description here

Upvotes: 3

Views: 1003

Answers (1)

pczeus
pczeus

Reputation: 7867

You are definitely being affected by mixing in the StringUtils class. If you run this:

import org.apache.commons.lang3.StringUtils
String str = ",,,,,,"

println "String..."
println str.split(",",-1).length
println str.split(",").length
def methods = String.metaClass.methods*.name.sort().unique()
println "$methods.size:$methods"

println "\nStringUtils..."
println StringUtils.split(str, ",").length
println StringUtils.split(str,",",-1).length

println "\nStringUtils mixin..."
String.mixin StringUtils
println str.split(",",-1).length
println str.split(",").length
methods = String.metaClass.methods*.name.sort().unique()
println "$methods.size:$methods"

You will get the output:

String...
7
0
43:[charAt, codePointAt, codePointBefore, codePointCount, compareTo, compareToIgnoreCase, concat, contains, contentEquals, copyValueOf, endsWith, equals, equalsIgnoreCase, format, getBytes, getChars, getClass, hashCode, indexOf, intern, isEmpty, join, lastIndexOf, length, matches, notify, notifyAll, offsetByCodePoints, regionMatches, replace, replaceAll, replaceFirst, split, startsWith, subSequence, substring, toCharArray, toLowerCase, toString, toUpperCase, trim, valueOf, wait]

StringUtils...
0
0

StringUtils mixin...
0
0
149:[abbreviate, abbreviateMiddle, appendIfMissing, appendIfMissingIgnoreCase, capitalize, center, charAt, chomp, chop, codePointAt, codePointBefore, codePointCount, compareTo, compareToIgnoreCase, concat, contains, containsAny, containsIgnoreCase, containsNone, containsOnly, containsWhitespace, contentEquals, copyValueOf, countMatches, defaultIfBlank, defaultIfEmpty, defaultString, deleteWhitespace, difference, endsWith, endsWithAny, endsWithIgnoreCase, equals, equalsIgnoreCase, format, getBytes, getCR, getChars, getClass, getEMPTY, getFuzzyDistance, getINDEX_NOT_FOUND, getJaroWinklerDistance, getLF, getLevenshteinDistance, getPAD_LIMIT, getSPACE, hashCode, indexOf, indexOfAny, indexOfAnyBut, indexOfDifference, indexOfIgnoreCase, intern, isAllLowerCase, isAllUpperCase, isAlpha, isAlphaSpace, isAlphanumeric, isAlphanumericSpace, isAsciiPrintable, isBlank, isEmpty, isNotBlank, isNotEmpty, isNumeric, isNumericSpace, isWhitespace, join, lastIndexOf, lastIndexOfAny, lastIndexOfIgnoreCase, lastOrdinalIndexOf, left, leftPad, length, lowerCase, matches, mid, normalizeSpace, notify, notifyAll, offsetByCodePoints, ordinalIndexOf, overlay, prependIfMissing, prependIfMissingIgnoreCase, regionMatches, remove, removeEnd, removeEndIgnoreCase, removePattern, removeStart, removeStartIgnoreCase, repeat, replace, replaceAll, replaceChars, replaceEach, replaceEachRepeatedly, replaceFirst, replaceOnce, replacePattern, reverse, reverseDelimited, right, rightPad, setCR, setEMPTY, setINDEX_NOT_FOUND, setLF, setPAD_LIMIT, setSPACE, split, splitByCharacterType, splitByCharacterTypeCamelCase, splitByWholeSeparator, splitByWholeSeparatorPreserveAllTokens, splitPreserveAllTokens, startsWith, startsWithAny, startsWithIgnoreCase, strip, stripAccents, stripEnd, stripStart, stripToEmpty, stripToNull, subSequence, substring, substringAfter, substringAfterLast, substringBefore, substringBeforeLast, substringBetween, substringsBetween, swapCase, toCharArray, toLowerCase, toString, toUpperCase, trim, trimToEmpty, trimToNull, uncapitalize, upperCase, valueOf, wait, wrap]

Showing that the behavior is different without the mixin StringUtils. In fact, plain ol' Groovy is returning 7 if you add the -1 and 0 without, which is a result of using the common Java split() method, and both Java, Groovy return the same result.

StringUtils retuns 0 with or without the -1 parameter.

In addition, you can see that the String class has 43 methods before applying the mixin, which then shows String to have 149 methods, where the additional methods match those found in StringUtils

So, you will notice that the 2 lines after the println "\StringUtils..." statement output the same result as when executed with the mixed in StringUtils, both statements returning 0.

When doing a mixin, it is similar to currying, in that the original String 'str' is passed to the StringUtils.split() method as the first argument. For this reason, the 2 statements when using the mixin that have 2 arguments and 1 respectively, are equivalent to the 2 statements using StringUtils without the mixin, having 3 and 2 arguments.

More specifically:

str.split(",",-1) == StringUtils.split(str, ",", -1)

once you apply the mixin

Upvotes: 3

Related Questions