wannaBeDev
wannaBeDev

Reputation: 739

Split at every i-th and j-th char

I need to split a string at every i-th and j-th character, where i and j can change according to input parameters. If for example i have an input

String s = "1234567890abcdef";
int i = 2;
int j = 3;

I want my output to be an array of:

[12, 345, 67, 890, ab, cde, f]

I found a compact regex to split at every n-th char. Example for n = 3 using "(?<=\\G...)" or "(?<=\\G.{3})"

String s = "1234567890abcdef";
int n = 3;
System.out.println(Arrays.toString(s.split("(?<=\\G.{"+n+"})")));

//output: [123, 456, 789, 0ab, cde, f]

How to modify the above regex to split at every 2nd and 3rd char alternately?

A naive chaining like "(?<=\\G.{2})(?<=\\G.{3})" did not work.

Upvotes: 4

Views: 124

Answers (4)

dshelya
dshelya

Reputation: 468

There is a somewhat hacky way to split() using regex, but as @horcrux mentioned:

every match should be aware of the pattern previously matched

You would have to:

a) insert an anchor to make further backreferences by adding a "unlikely" character or string (e.g. line-break) into every i + j position first:

s = s.replaceAll("(.{5})", "$1\n");

So that your string transforms to 12345\n67890\nabcde\nf

b) Now you can split by looking around

String[] result = s.split("(?<=\\G.{2})(?=.{3}\n)|\n");

where you look for a zero-length match having i characters on the left (?<=\G.{2}) and followed by j characters ending with your "special" pattern OR just match your "special" pattern if not found.

This allows alternating split either at a position i or at the match of "special" pattern.

using hash # as special pattern

Complete one-liner (for educational purposes only):

System.out.println(Arrays.toString(s.replaceAll("(.{"+(i+j)+"})", "$1#").split("(?<=\\G.{"+i+"})(?=.{"+j+"}#)|#")));

Upvotes: 2

logi-kal
logi-kal

Reputation: 7880

I don't think you can do this with split(), because every match should be aware of the pattern previously matched.

If you don't want to manually iterate over the string's characters, you can use something like this:

Matcher m = Pattern.compile("(.{0,2})(.{0,3})").matcher("1234567890abcdef");
List<String> list = new ArrayList<>();
while (m.find()) {
  for (int i = 1; i <= 2; i++) {
    if (!m.group(i).isEmpty()) {
      list.add(m.group(i));
    }
  }
}
System.out.println(list);  // prints [12, 345, 67, 890, ab, cde, f]

Upvotes: 5

logi-kal
logi-kal

Reputation: 7880

Here is another simple solution which doesn't make use of regular expressions:

String s = "1234567890abcdef";
int strLen = s.length();
List<String> list = new ArrayList<>();
for (int lastIndex = 0; lastIndex < strLen;) {
    int numChars = list.size() % 2 == 0 ? 2 : 3; // this alternates substrings of length 2 and 3
    if (strLen - lastIndex < numChars)
        list.add(s.substring(lastIndex));
    else
        list.add(s.substring(lastIndex, lastIndex+numChars));
    lastIndex += numChars;
}
System.out.println(list);  // prints [12, 345, 67, 890, ab, cde, f]

Upvotes: 2

Most Noble Rabbit
Most Noble Rabbit

Reputation: 2776

O(n) solution by iterating over the characters:

private static List<String> splitByPattern(String str, List<Integer> pattern) {
    int currentPatternIndex = 0;
    int iterationsTillNextSplit = pattern.get(currentPatternIndex);
    StringBuilder stringBuilder = new StringBuilder();
    List<String> strs = new ArrayList<>();

    for (char c : str.toCharArray()) {
        if (iterationsTillNextSplit == 0) { // Time to split
            strs.add(stringBuilder.toString());
            stringBuilder = new StringBuilder();
            iterationsTillNextSplit = pattern.get(++currentPatternIndex % pattern.size());
        }

        stringBuilder.append(c);
        iterationsTillNextSplit--;
    }

    strs.add(stringBuilder.toString());

    return strs;
}

Usage:

System.out.println(splitByPattern("1234567890abcdef", Arrays.asList(2, 3)));

Output:

[12, 345, 67, 890, ab, cde, f]

Upvotes: 2

Related Questions