Reputation: 707
I'm trying to write a little program that extracts information from between nested parentheses. For example, if I'm given the string:
"content (content1 (content2, content3) content4 (content5 (content6, content7))"
I would like this to be returned (in an ArrayList or other Collection):
["content", "content1", "content2, content3", "content4", "content5", "content6, content7"]
Are there any already existing libraries or an algorithm that I could use to assist me in this.
Thanks in advance!
Edit
Thanks for the suggestions however content2 and content3 should be saved in the same string in the final list because they are within the same set of parentheses.
Upvotes: 3
Views: 8997
Reputation: 7061
This seems to meet your one example given above:
import java.util.ArrayList;
public class ParseParenthesizedString {
public enum States { STARTING, TOKEN, BETWEEN }
public static void main(String[] args)
{
ParseParenthesizedString theApp = new ParseParenthesizedString();
theApp.Answer();
}
public void Answer()
{
String theString =
"content (content1 (content2, content3) content4 (content5 (content6, content7))";
// wants:
// ["content", "content1", "content2, content3", "content4", "content5", "content6, content7"]
States state = States.STARTING;
ArrayList<String> theStrings = new ArrayList<String>();
StringBuffer temp = new StringBuffer("");
for (int i = 0; i < theString.length() ; i++)
{
char cTemp = theString.charAt(i);
switch (cTemp)
{
case '(':
{
if (state == States.STARTING) state = States.BETWEEN;
else if (state == States.BETWEEN) {}
else if (state == States.TOKEN )
{
state = States.BETWEEN;
theStrings.add(temp.toString().trim());
temp.delete(0,temp.length());
}
break;
}
case ')':
{
if (state == States.STARTING)
{ /* this is an error */ }
else if (state == States.TOKEN)
{
theStrings.add(temp.toString().trim());
temp.delete(0,temp.length());
state = States.BETWEEN;
}
else if (state == States.BETWEEN ) {}
break;
}
default:
{
state = States.TOKEN;
temp.append(cTemp);
}
}
}
PrintArrayList(theStrings);
}
public static void PrintArrayList(ArrayList<String> theList)
{
System.out.println("The ArrayList with "
+ theList.size() + " elements:");
for (int i = 0; i < theList.size(); i++)
{
System.out.println(i + ":" + theList.get(i));
}
}
}
Outputs:
The ArrayList with 6 elements:
0:content
1:content1
2:content2, content3
3:content4
4:content5
5:content6, content7
Upvotes: 2
Reputation: 1
Java's String.split() will do the job for you. It requires a regex to define the delimiter between each token...for you, it seems your delimiters are parentheses or commas, optionally surrounded by whitespace on either side. So this should do the trick:
String[] result = s.split("\\s*[\\(\\),]+\\s*");
Upvotes: -1