Abdul
Abdul

Reputation: 1040

Skip first occurance and split the string in Java

I want to skip first occurrence if no of occurrence more than 4. For now I will get max of 5 number underscore occurrence. I need to produce the output A_B, C, D, E, F and I did using below code. I want better solution. Please check and let me know. Thanks in advance.

String key = "A_B_C_D_E_F";
int occurance = StringUtils.countOccurrencesOf(key, "_");
System.out.println(occurance);
String[] keyValues = null;
if(occurance == 5){
    key = key.replaceFirst("_", "-");
    keyValues = StringUtils.tokenizeToStringArray(key, "_");
    keyValues[0] = replaceOnce(keyValues[0], "-", "_");
}else{
    keyValues = StringUtils.tokenizeToStringArray(key, "_");
}

for(String keyValue : keyValues){
    System.out.println(keyValue);
}

Upvotes: 2

Views: 4971

Answers (5)

Vampire
Vampire

Reputation: 38629

Well, it is relatively "simple":

String str = "A_B_C_D_E_F_G";
String[] result = str.split("(?<!^[^_]*)_|_(?=(?:[^_]*_){0,3}[^_]*$)");
System.out.println(Arrays.toString(result));

Here a version with comments for better understanding that can also be used as is:

String str = "A_B_C_D_E_F_G";
String[] result = str.split("(?x)                  # enable embedded comments \n"
                            + "                    # first alternative splits on all but the first underscore \n"
                            + "(?<!                # next character should not be preceded by \n"
                            + "    ^[^_]*          #     only non-underscores since beginning of input \n"
                            + ")                   # so this matches only if there was an underscore before \n"
                            + "_                   # underscore \n"
                            + "|                   # alternatively split if an underscore is followed by at most three more underscores to match the less than five underscores case \n"
                            + "_                   # underscore \n"
                            + "(?=                 # preceding character must be followed by \n"
                            + "    (?:[^_]*_){0,3} #     at most three groups of non-underscores and an underscore \n"
                            + "    [^_]*$          #     only more non-underscores until end of line \n"
                            + ")");
System.out.println(Arrays.toString(result));

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626709

Although Java does not say that officially, you can use * and + in the lookbehind as they are implemented as limiting quantifiers: * as {0,0x7FFFFFFF} and + as {1,0x7FFFFFFF} (see Regex look-behind without obvious maximum length in Java). So, if your strings are not too long, you can use

String key = "A_B_C_D";       // => [A, B, C, D]
//String key = "A_B_C_D_E_F"; // => [A_B, C, D, E, F]
String[] res = null;
if (key.split("_").length > 4) {
    res = key.split("(?<!^[^_]*)_");
} else {
    res = key.split("_");
}
System.out.println(Arrays.toString(res));

See the JAVA demo

DISCLAIMER: Since this is an exploit of the current Java 8 regex engine, the code may break in the future when the bug is fixed in Java.

Upvotes: 0

OldCurmudgeon
OldCurmudgeon

Reputation: 65793

I would do it after the split.

public void test() {
    String key = "A_B_C_D_E_F";
    String[] parts = key.split("_");
    if (parts.length >= 5) {
        String[] newParts = new String[parts.length - 1];
        newParts[0] = parts[0] + "-" + parts[1];
        System.arraycopy(parts, 2, newParts, 1, parts.length - 2);
        parts = newParts;
    }
    System.out.println("parts = " + Arrays.toString(parts));
}

Upvotes: 0

Maljam
Maljam

Reputation: 6274

You can use this regex to split:

String s = "A_B_C_D_E_F";
String[] list = s.split("(?<=_[A-Z])_");

Output:

[A_B, C, D, E, F]

The idea is to match only the _ who are preceded by "_[A-Z]", which effectively skips only the first one.

If the strings you are considering have a different format between the "_", you have to replace [A-Z] by the appropriate regex

Upvotes: 2

anubhava
anubhava

Reputation: 784958

You can use this regex based on \G and instead of splitting use matching:

String str = "A_B_C_D_E_F";
Pattern p = Pattern.compile("(^[^_]*_[^_]+|\\G[^_]+)(?:_|$)");
Matcher m = p.matcher(str);
List<String> resultArr = new ArrayList<>();
while (m.find()) {
    resultArr.add( m.group(1) );
}
System.err.println(resultArr);

\G asserts position at the end of the previous match or the start of the string for the first match.

Output:

[A_B, C, D, E, F]

RegEx Demo

Upvotes: 0

Related Questions