Reputation: 1040
I want to skip first occurrence if no of occurrence more than 4. For now I will get max of 5 number underscore occurrence. I need to produce the output A_B, C, D, E, F and I did using below code. I want better solution. Please check and let me know. Thanks in advance.
String key = "A_B_C_D_E_F";
int occurance = StringUtils.countOccurrencesOf(key, "_");
System.out.println(occurance);
String[] keyValues = null;
if(occurance == 5){
key = key.replaceFirst("_", "-");
keyValues = StringUtils.tokenizeToStringArray(key, "_");
keyValues[0] = replaceOnce(keyValues[0], "-", "_");
}else{
keyValues = StringUtils.tokenizeToStringArray(key, "_");
}
for(String keyValue : keyValues){
System.out.println(keyValue);
}
Upvotes: 2
Views: 4971
Reputation: 38629
Well, it is relatively "simple":
String str = "A_B_C_D_E_F_G";
String[] result = str.split("(?<!^[^_]*)_|_(?=(?:[^_]*_){0,3}[^_]*$)");
System.out.println(Arrays.toString(result));
Here a version with comments for better understanding that can also be used as is:
String str = "A_B_C_D_E_F_G";
String[] result = str.split("(?x) # enable embedded comments \n"
+ " # first alternative splits on all but the first underscore \n"
+ "(?<! # next character should not be preceded by \n"
+ " ^[^_]* # only non-underscores since beginning of input \n"
+ ") # so this matches only if there was an underscore before \n"
+ "_ # underscore \n"
+ "| # alternatively split if an underscore is followed by at most three more underscores to match the less than five underscores case \n"
+ "_ # underscore \n"
+ "(?= # preceding character must be followed by \n"
+ " (?:[^_]*_){0,3} # at most three groups of non-underscores and an underscore \n"
+ " [^_]*$ # only more non-underscores until end of line \n"
+ ")");
System.out.println(Arrays.toString(result));
Upvotes: 2
Reputation: 626709
Although Java does not say that officially, you can use *
and +
in the lookbehind as they are implemented as limiting quantifiers: *
as {0,0x7FFFFFFF}
and +
as {1,0x7FFFFFFF}
(see Regex look-behind without obvious maximum length in Java). So, if your strings are not too long, you can use
String key = "A_B_C_D"; // => [A, B, C, D]
//String key = "A_B_C_D_E_F"; // => [A_B, C, D, E, F]
String[] res = null;
if (key.split("_").length > 4) {
res = key.split("(?<!^[^_]*)_");
} else {
res = key.split("_");
}
System.out.println(Arrays.toString(res));
See the JAVA demo
DISCLAIMER: Since this is an exploit of the current Java 8 regex engine, the code may break in the future when the bug is fixed in Java.
Upvotes: 0
Reputation: 65793
I would do it after the split.
public void test() {
String key = "A_B_C_D_E_F";
String[] parts = key.split("_");
if (parts.length >= 5) {
String[] newParts = new String[parts.length - 1];
newParts[0] = parts[0] + "-" + parts[1];
System.arraycopy(parts, 2, newParts, 1, parts.length - 2);
parts = newParts;
}
System.out.println("parts = " + Arrays.toString(parts));
}
Upvotes: 0
Reputation: 6274
You can use this regex to split:
String s = "A_B_C_D_E_F";
String[] list = s.split("(?<=_[A-Z])_");
Output:
[A_B, C, D, E, F]
The idea is to match only the _
who are preceded by "_[A-Z]"
, which effectively skips only the first one.
If the strings you are considering have a different format between the "_"
, you have to replace [A-Z]
by the appropriate regex
Upvotes: 2
Reputation: 784958
You can use this regex based on \G
and instead of splitting use matching:
String str = "A_B_C_D_E_F";
Pattern p = Pattern.compile("(^[^_]*_[^_]+|\\G[^_]+)(?:_|$)");
Matcher m = p.matcher(str);
List<String> resultArr = new ArrayList<>();
while (m.find()) {
resultArr.add( m.group(1) );
}
System.err.println(resultArr);
\G
asserts position at the end of the previous match or the start of the string for the first match.
Output:
[A_B, C, D, E, F]
Upvotes: 0