DeeTee
DeeTee

Reputation: 707

Parsing String elements between nested parentheses

I'm trying to write a little program that extracts information from between nested parentheses. For example, if I'm given the string:

"content (content1 (content2, content3) content4 (content5 (content6, content7))"

I would like this to be returned (in an ArrayList or other Collection):

["content", "content1", "content2, content3", "content4", "content5", "content6, content7"]

Are there any already existing libraries or an algorithm that I could use to assist me in this.

Thanks in advance!

Edit

Thanks for the suggestions however content2 and content3 should be saved in the same string in the final list because they are within the same set of parentheses.

Upvotes: 3

Views: 8997

Answers (2)

Scooter
Scooter

Reputation: 7061

This seems to meet your one example given above:

import java.util.ArrayList; 

public class ParseParenthesizedString {
    public enum States { STARTING, TOKEN, BETWEEN }
    public static void main(String[] args)
    {
        ParseParenthesizedString theApp = new ParseParenthesizedString();
        theApp.Answer();
    }

    public void Answer()
    {
        String theString = 
           "content (content1 (content2, content3) content4 (content5 (content6, content7))";
        // wants:
        // ["content", "content1", "content2, content3", "content4", "content5", "content6, content7"]
        States state = States.STARTING;
        ArrayList<String> theStrings = new ArrayList<String>();
        StringBuffer temp = new StringBuffer("");

        for (int i = 0; i < theString.length() ; i++)
        {
            char cTemp = theString.charAt(i);
            switch (cTemp)
            {
                case '(':
                {
                    if (state == States.STARTING)  state = States.BETWEEN;
                    else if (state == States.BETWEEN)  {} 
                    else if (state == States.TOKEN )
                    {
                        state = States.BETWEEN;
                        theStrings.add(temp.toString().trim());
                        temp.delete(0,temp.length());
                    }
                    break;
                }
                case ')':
                {
                    if (state == States.STARTING) 
                    {  /* this is an error */ }
                    else if (state == States.TOKEN) 
                    {
                        theStrings.add(temp.toString().trim());
                        temp.delete(0,temp.length());
                        state = States.BETWEEN;
                    } 
                    else if (state == States.BETWEEN ) {}
                    break;
                }
                default:
                {
                    state = States.TOKEN;
                    temp.append(cTemp);
                }
            }
        }

        PrintArrayList(theStrings);
    }
    public static void PrintArrayList(ArrayList<String> theList)
    {    
        System.out.println("The ArrayList with " 
                + theList.size() + " elements:");
        for (int i = 0; i < theList.size(); i++)
        {
            System.out.println(i + ":" + theList.get(i));
        }
    }
}

Outputs:

The ArrayList with 6 elements:
0:content
1:content1
2:content2, content3
3:content4
4:content5
5:content6, content7

Upvotes: 2

ajrgrubbs
ajrgrubbs

Reputation: 1

Java's String.split() will do the job for you. It requires a regex to define the delimiter between each token...for you, it seems your delimiters are parentheses or commas, optionally surrounded by whitespace on either side. So this should do the trick:

String[] result = s.split("\\s*[\\(\\),]+\\s*");

Upvotes: -1

Related Questions