Reputation: 1001
I need to use Java, so scripting language support must NOT be used.
The string representation I need to parse is like:
op1 (t1,t2,t3,...)
where t1, t2, t3 etc. can again be of something like op2 (t11,t12,t13 ...)
or just an atomic unit (can not be composed of elements in itself)
A concrete example is:
op1 (op2 (t1 t2) t3)
I want to parse it in a tree like structure (hierarchical)
op1
op2
t1
t2
t3
Assuming op1 is the root of the tree, op2 is the left subchild of op1, and t3 is the right subchild of op1. t1 and t2 are the subchildren of op2, respectively.
How can I do it in Java? THe challenging part is that the resulting tree must not be a binary tree. A node can have arbitrary number of children.
Upvotes: 3
Views: 6425
Reputation: 1777
First of all I would recommend you to split this problem on two sub problems, first - parsing input, second - creating tree.
You could use string tokenizer and stack for the first one, you could skip "op1"s because in general your problem looks like ((t1 t2) t3). So when your current element will be = ")" you should pop elements from your stack until you'll reach "(" and create new Node out of that elements and put it back to stack. Another question is what type of data you'll store in the stack, obviously it wouldn't be strings, so probably you have to create some hierarchy of elements which could be placed there, e.g. StringElement, NodeElement, StartQuoteElement.
Upvotes: 0
Reputation: 22514
Just for fun, a very quick and very dirty, and provably buggy solution. Hey, at least it solves your example!!!!
package parser;
import java.util.Collection;
import java.util.ArrayList;
import java.util.Arrays;
/**
*
* @author gpeche
*/
public class Parser {
private static abstract class Node {
String name;
String getName() { return name; }
abstract boolean isComposite();
Node(String name) { this.name = name; }
}
private static class PrimitiveNode extends Node {
@Override boolean isComposite() { return false; }
PrimitiveNode(String name) { super(name); }
@Override public String toString() { return "PrimitiveNode(\"" + name + "\")"; }
}
private static class CompositeNode extends Node {
Collection<Node> children = new ArrayList<Node>();
void addChild(Node childNode) { children.add(childNode); }
Collection<Node> getChildren() { return new ArrayList<Node>(children); }
@Override boolean isComposite() { return false; }
CompositeNode(String name) { super(name); }
@Override public String toString() {
StringBuilder sb = new StringBuilder();
sb.append("CompositeNode(\"");
sb.append(name);
sb.append("\") { ");
boolean isFirstNode = true;
for (Node node: children) {
if (!isFirstNode) {
sb.append(", ");
}
sb.append(node.toString());
isFirstNode = false;
}
sb.append(" }");
return sb.toString();
}
}
// Parser state
int pos = 0;
String[] tokens;
static final String OPENING_PAR = "(";
static final String CLOSING_PAR = ")";
static final String OP_PLUS = "+";
static final String OP_MINUS = "-";
static final String OP_MULTIPLY = "*";
static final String OP_DIVIDE = "/";
static final String OP_MODULUS = "mod";
static final String OP_INT_DIVIDE = "div";
static final Collection<String> PARENTHESIS = Arrays.asList(OPENING_PAR, CLOSING_PAR);
static final Collection<String> OPERATIONS = Arrays.asList( OP_PLUS,
OP_MINUS,
OP_MULTIPLY,
OP_DIVIDE,
OP_MODULUS,
OP_INT_DIVIDE);
public Node parse(String treeRepresentation) {
tokenize(treeRepresentation);
return parseNode();
}
private void tokenize(String treeRepresentation) {
treeRepresentation = treeRepresentation.replace("(", " ( ");
treeRepresentation = treeRepresentation.replace(")", " ) ");
tokens = treeRepresentation.split("\\s+");
}
private Node parseNode() {
// check that current token is not a "syntax" token
String currentToken = tokens[pos];
if (PARENTHESIS.contains(currentToken)) {
throw new IllegalArgumentException(String.format("Invalid token %d, expected identifier. got %s", pos + 1, currentToken));
}
boolean isComposite = currentToken != null &&
(currentToken.startsWith("op") || // Accept identifiers as operations (function support :P)
OPERATIONS.contains(currentToken));
return isComposite? parseComposite() : parsePrimitive();
}
private Node parseComposite() {
CompositeNode composite = new CompositeNode(tokens[pos]);
pos++;
if (!OPENING_PAR.equals(tokens[pos])) {
throw new IllegalArgumentException(String.format("Invalid token %d, expected '(', got %s", pos + 1, tokens[pos]));
} else {
// Ignore opening parenthesis
pos++;
}
boolean nextIsIdentifier;
do {
composite.addChild(parseNode());
nextIsIdentifier = !PARENTHESIS.contains(tokens[pos]);
} while (nextIsIdentifier);
if (!CLOSING_PAR.equals(tokens[pos])) {
throw new IllegalArgumentException(String.format("Invalid token %d, expected ')', got %s", pos + 1, tokens[pos]));
} else {
pos++;
}
return composite;
}
private Node parsePrimitive() {
// Create primitive node and advance position
Node result = new PrimitiveNode(tokens[pos]);
pos++;
return result;
}
/**
* @param args the command line arguments
*/
public static void main(String[] args) {
Parser parser = new Parser();
Node parsedNode = parser.parse("op1 (op2 (t1 t2) t3)");
System.out.println(parsedNode.toString());
}
}
Upvotes: 0
Reputation: 26574
If you can't use JavaCC then you could look at the StringTokenizer class. You could do it in a couple passes. First, tokenize on the parenthesis, creating a first pass tree. Then you could walk the tree and tokenize on space fleshing out the tree nodes further for those nodes that only have leafs and not nested trees (i.e. have no children that contain parens)
op1 (op2 (t1 t2) t3)
when tokenized on '(' and ')' would give the tokens (assuming you ask for tokens to be included) op1, (, op2, (, t1 t2, ), t3, )
From that you walk through the tokens. You know your first is the parent. Second is a paren so you know that you have a complex child. so your tree would be:
op1
op2
Then you hit another paren, meaning a new complex child. The next token after the second open paren is t1 t2
so your tree is
op1
op2
t1 t2
Then you get a close paren token, so you end the complex child of op2 and your next token is the next child of op1, meaning your tree would then look like
op1
op2
t1 t2
t3
Finally you hit the last close paren, which ends the complex child of op1.
Now you can walk the tree, splitting the child nodes on space. First node is op1 so no split, same with op2. 't1 t2' splits into 't1', and 't2' so you end up splitting that node into two so your tree looks like
op1
op2
t1
t2
t3
You could probably easily put the space splitting into the first method so you wouldn't have to walk the tree twice.
Upvotes: 2