Reputation: 2016
I'm trying to split a String and retain n + 1
items where there are n
delimiters. There are many solutions here on SO that suggest using .split(regex, -1)
to retrieve all tokens. This is not working, however, when trying it in Groovy.
println ",,,,,,".split(",", -1).length
prints 0
Any idea what I can do about this to get consistent behavior with the Java method? Calling .toString()
on it makes no difference (Converting GString
to java.lang.String
)
edit: I also had String.mixin StringUtils
in my script. There is no conflicting method signature since StringUtils
does not have a .split(regex, int)
method defined. Am I using mixin incorrectly? Is there any way to have this play nicely together?
Upvotes: 3
Views: 1003
Reputation: 7867
You are definitely being affected by mixing in the StringUtils
class. If you run this:
import org.apache.commons.lang3.StringUtils
String str = ",,,,,,"
println "String..."
println str.split(",",-1).length
println str.split(",").length
def methods = String.metaClass.methods*.name.sort().unique()
println "$methods.size:$methods"
println "\nStringUtils..."
println StringUtils.split(str, ",").length
println StringUtils.split(str,",",-1).length
println "\nStringUtils mixin..."
String.mixin StringUtils
println str.split(",",-1).length
println str.split(",").length
methods = String.metaClass.methods*.name.sort().unique()
println "$methods.size:$methods"
You will get the output:
String...
7
0
43:[charAt, codePointAt, codePointBefore, codePointCount, compareTo, compareToIgnoreCase, concat, contains, contentEquals, copyValueOf, endsWith, equals, equalsIgnoreCase, format, getBytes, getChars, getClass, hashCode, indexOf, intern, isEmpty, join, lastIndexOf, length, matches, notify, notifyAll, offsetByCodePoints, regionMatches, replace, replaceAll, replaceFirst, split, startsWith, subSequence, substring, toCharArray, toLowerCase, toString, toUpperCase, trim, valueOf, wait]
StringUtils...
0
0
StringUtils mixin...
0
0
149:[abbreviate, abbreviateMiddle, appendIfMissing, appendIfMissingIgnoreCase, capitalize, center, charAt, chomp, chop, codePointAt, codePointBefore, codePointCount, compareTo, compareToIgnoreCase, concat, contains, containsAny, containsIgnoreCase, containsNone, containsOnly, containsWhitespace, contentEquals, copyValueOf, countMatches, defaultIfBlank, defaultIfEmpty, defaultString, deleteWhitespace, difference, endsWith, endsWithAny, endsWithIgnoreCase, equals, equalsIgnoreCase, format, getBytes, getCR, getChars, getClass, getEMPTY, getFuzzyDistance, getINDEX_NOT_FOUND, getJaroWinklerDistance, getLF, getLevenshteinDistance, getPAD_LIMIT, getSPACE, hashCode, indexOf, indexOfAny, indexOfAnyBut, indexOfDifference, indexOfIgnoreCase, intern, isAllLowerCase, isAllUpperCase, isAlpha, isAlphaSpace, isAlphanumeric, isAlphanumericSpace, isAsciiPrintable, isBlank, isEmpty, isNotBlank, isNotEmpty, isNumeric, isNumericSpace, isWhitespace, join, lastIndexOf, lastIndexOfAny, lastIndexOfIgnoreCase, lastOrdinalIndexOf, left, leftPad, length, lowerCase, matches, mid, normalizeSpace, notify, notifyAll, offsetByCodePoints, ordinalIndexOf, overlay, prependIfMissing, prependIfMissingIgnoreCase, regionMatches, remove, removeEnd, removeEndIgnoreCase, removePattern, removeStart, removeStartIgnoreCase, repeat, replace, replaceAll, replaceChars, replaceEach, replaceEachRepeatedly, replaceFirst, replaceOnce, replacePattern, reverse, reverseDelimited, right, rightPad, setCR, setEMPTY, setINDEX_NOT_FOUND, setLF, setPAD_LIMIT, setSPACE, split, splitByCharacterType, splitByCharacterTypeCamelCase, splitByWholeSeparator, splitByWholeSeparatorPreserveAllTokens, splitPreserveAllTokens, startsWith, startsWithAny, startsWithIgnoreCase, strip, stripAccents, stripEnd, stripStart, stripToEmpty, stripToNull, subSequence, substring, substringAfter, substringAfterLast, substringBefore, substringBeforeLast, substringBetween, substringsBetween, swapCase, toCharArray, toLowerCase, toString, toUpperCase, trim, trimToEmpty, trimToNull, uncapitalize, upperCase, valueOf, wait, wrap]
Showing that the behavior is different without the mixin StringUtils
.
In fact, plain ol' Groovy is returning 7 if you add the -1 and 0 without, which is a result of using the common Java split()
method, and both Java, Groovy return the same result.
StringUtils
retuns 0 with or without the -1 parameter.
In addition, you can see that the String
class has 43 methods before applying the mixin, which then shows String
to have 149 methods, where the additional methods match those found in StringUtils
So, you will notice that the 2 lines after the println "\StringUtils..."
statement output the same result as when executed with the mixed in StringUtils
, both statements returning 0.
When doing a mixin, it is similar to currying, in that the original String 'str' is passed to the StringUtils.split()
method as the first argument. For this reason, the 2 statements when using the mixin that have 2 arguments and 1 respectively, are equivalent to the 2 statements using StringUtils
without the mixin, having 3 and 2 arguments.
More specifically:
str.split(",",-1) == StringUtils.split(str, ",", -1)
once you apply the mixin
Upvotes: 3