Reputation: 543
I need some advice with the following problem. I receive text with the following format:
"text (textInBrackets), text2 (textInBrackets2), text3 (textInBrackets3),..."
Every text and textInBrackets could have letters, numbers and also brackets. The separation between pairs are the commas, the closing bracket near the comma is the one that determines where the right element of the pair ends.
I need to split the text in a way that I could separate every pair of text and textInBrackets and put it in an array like:
String[][] pairs= new String[n][2];
pair[0][0]="text";
pair[0][1]="textInBrackets";
pair[1][0]="text2";
pair[1][1]="textInBrackets2";
Example:
String text="texttext(text)text(subtext), othertext152(de)sert(subothertext), textwithoutbracket, elems(subelem)";
String[][] return=splitFunction(text);
The return array is:
String[][] pairs= new String[n][2];
pair[0][0]="texttext(text)text";
pair[0][1]="subtext";
pair[1][0]="othertext152(de)sert";
pair[1][1]="subothertext";
pair[2][0]="textwithoutbracket";
pair[2][1]=null;
pair[3][0]="elems";
pair[3][1]="subelem";
I already have a solution for the problem but is not bullet proof and it has some bugs.
Upvotes: 2
Views: 162
Reputation: 751
What you are trying to achieve is actually hard problem to implement (if bracket's inside bracket's text must be enclosing, for example "(sa(ssa)sa)"). If your case was that text inside bracket's could not contain another text inside bracket's etc .. solution would be quite easy as people already proposed to you. Code to verify such pattern and to obtain groups from it would look like this:
String text = "text (textInBrackets), text2 (textInBrackets2), text3 (textInBrackets3)";
Pattern pattern = Pattern.compile("(\\w+ \\(\\w+\\))((, \\w+ \\(\\w+\\))*)");
Matcher matcher = pattern.matcher(text);
System.out.println(matcher.matches());
System.out.println(matcher.group(0));
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
with output:
true
text (textInBrackets), text2 (textInBrackets2), text3 (textInBrackets3)
text (textInBrackets)
, text2 (textInBrackets2), text3 (textInBrackets3)
, text3 (textInBrackets3)
But you also have specification that tell's that text inside bracket's might contain another text inside bracket's etc .. (i don't know if it has to be closed bracket text again or not, if not what continues is not valid for your case). Such text is no longer regular grammar (which can be parsed with regex) but is context free grammar. To verify and parse such text you would need to use implementation with stack where u would push left bracket and pop right bracket once you find it. This is what actually push down automat, which is able to parse context free grammar, does. Your text would still be regular valid grammar if you would know how many times text within bracket can be nested.
For example:
"text (sad(sdasddsa)sadas)"
you know that bracket text is nested at max 1 time and you can adjust your manual implementation or regex to it. Such pattern would look like this (might be quite different that depend's on how you want it to behave, if empty bracket's are also valid or no etc...):
Pattern pattern = Pattern.compile("(\\w+ \\(\\w+(\\(\\w*\\))*\\w+\\))((, \\w+ \\(\\w+(\\(\\w*\\))*\\w+\\))*)");
You can see that i had to adjust my pattern so it contain's information about the nested bracket's. You can do this X time's but cannot do this forever. That's exactly where this problem loses it's regular grammar behavior and become's context free grammar.
Once you don't have information about nesting level's (and there can be N nested levels) you need to use context free grammar (or push down automat). Since this is quite hard topic to explain, because one needs to have some theory education around automata theory, grammar's, how regex relate's to regular grammar etc... I suggest you to learn some background around this to understand my answer. If you don't have much time to resolve this issue, just provide to whoever asked you to implement arguments i have provided and implement your program to work with nested bracket's at max nested level 1 for example.
Upvotes: 4
Reputation: 89139
You can split on a comma and space and then use lastIndexOf
and substring
to divide the parts.
String[] parts = text.split(", ");
String[][] result = new String[parts.length][2];
for (int i = 0; i < parts.length; i++) {
String part = parts[i];
int lastIdx = part.lastIndexOf('(');
if (lastIdx == -1) {
result[i][0] = part;
} else {
result[i] = new String[] { part.substring(0, lastIdx), part.substring(lastIdx + 1, part.length() - 1) };
}
}
Upvotes: 2