uacnix
uacnix

Reputation: 47

Split by a comma that is not inside parentheses, skipping anything inside them

I know it might be another topic about regexes, but despite I searched it, I couldn't get the clear answer. So here is my problem- I have a string like this:

{1,2,{3,{4},5},{5,6}}

I'm removing the most outside parentheses (they are there from input, and I don't need them), so now I have this:

1,2,{3,{4},5},{5,6}

And now, I need to split this string into an array of elements, treating everything inside these parentheses as one, "seamless" element:

Arr[0]    1
Arr[1]    2
Arr[2]    {3,{4},5}
Arr[3]    {5,6}

I have tried doing it using lookahead but so far, I'm failing (miserably). What would be the neatest way of dealing with those things in terms of regex?

Upvotes: 3

Views: 1710

Answers (3)

timekeeper
timekeeper

Reputation: 718

Almost near to the requirement. Running out of time. Will complete rest later (A single comma is incorrect).
Regex: ,(?=[^}]*(?:{|$))
To check regex validity: Go to http://regexr.com/

enter image description here

To implement this pattern in Java, there is a slight difference. \ needs to be added before { and }.

Hence, regex for Java Input: ,(?=[^\\}]*(?:\\{|$))

String numbers = {1,2,{3,{4},5},{5,6}};
numbers = numbers.substring(1, numbers.length()-1);
String[] separatedValues = numbers.split(",(?=[^\\}]*(?:\\{|$))");
System.out.println(separatedValues[0]);

Upvotes: 1

Shar1er80
Shar1er80

Reputation: 9041

Could not figure out a regex solution, but here's a non-regex solution. It involves parsing numbers (not in curly braces) before each comma (unless its the last number in the string) and parsing strings (in curly braces) until the closing curly brace of the group is found.

If regex solution is found, I'd love to see it.

public static void main(String[] args) throws Exception {
    String data = "1,2,{3,{4},5},{5,6},-7,{7,8},{8,{9},10},11";
    List<String> list = new ArrayList();
    for (int i = 0; i < data.length(); i++) {
        if ((Character.isDigit(data.charAt(i))) ||
            // Include negative numbers
             (data.charAt(i) == '-') && (i + 1 < data.length() && Character.isDigit(data.charAt(i + 1)))) {
            // Get the number before the comma, unless it's the last number
            int commaIndex = data.indexOf(",", i);
            String number = commaIndex > -1
                    ? data.substring(i, commaIndex)
                    : data.substring(i);
            list.add(number);
            i += number.length();
        } else if (data.charAt(i) == '{') {
            // Get the group of numbers until you reach the final 
            // closing curly brace
            StringBuilder sb = new StringBuilder();
            int openCount = 0;
            int closeCount = 0;
            do {
                if (data.charAt(i) == '{') {
                    openCount++;
                } else if (data.charAt(i) == '}') {
                    closeCount++;
                }
                sb.append(data.charAt(i));
                i++;
            } while (closeCount < openCount);
            list.add(sb.toString());
        }
    }

    for (int i = 0; i < list.size(); i++) {
        System.out.printf("Arr[%d]: %s\r\n", i, list.get(i));
    }
}

Results:

Arr[0]: 1
Arr[1]: 2
Arr[2]: {3,{4},5}
Arr[3]: {5,6}
Arr[4]: -7
Arr[5]: {7,8}
Arr[6]: {8,{9},10}
Arr[7]: 11

Upvotes: 0

ShellFish
ShellFish

Reputation: 4551

You cannot do this if elements like this should be kept together: {{1},{2}}. The reason is that a for this is equivalent to parsing the balanced parenthesis language. This language is context-free and cannot be parsed using a regular expression. The best way to handle this is not to use regex but use a for loop with a stack (the stack gives power to parse context-free languages). In pseudo code we could do:

for char in input
    if stack is empty and char is ','
        add substring(last, current position) to output array
        last = current index 
    if char is '{'
         push '{' on stack
    if char is '}'
         pop from stack

This pseudo code will construct the array as desired, note that it's best to loop over the indexes of the chars in the given string as you'll need those to determine the boundaries of the substrings to add to the array.

Upvotes: 3

Related Questions