jerluc
jerluc

Reputation: 4316

java parsing data file format

I have a data file, which looks like this:


0.5, 0.0 [
 1.5, -1.0 [
  inputs
 ]
 ;
 0.5, 1.0 [
  inputs
 ]
]

Which I'm trying to parse into a tree-like structure. In the above example, the tree should end up like so:


             Node (0.5, 0.0)
             /             \
   Node (1.5, -1.0)   Node (0.5, 1.0)
                 \     /
               Inputs Node

Pretty much the tree structure is like any basic tree (with the exception that all the bottom-most nodes connect to the same inputs node.

So far, to parse it, I have the following:


private void createSubLayer (String net, Node parent, int level) {
  level++;
  String[] nodes = net.split(";");

  for (String node : nodes) {
   if (node.equals("inputs")) {
    System.out.println("Connecting input @ " + level);
    for (Node n : inputs) {
     parent.connect(n);
    }
   }
   else {
    Node newNode;
    String[] nodeInfo = node.split("\\[", 2);
    String nodeDetails = nodeInfo[0];
    System.out.println(nodeInfo.length);
    System.out.println(nodeDetails);
    String subNet = nodeInfo[1].substring(0, nodeInfo[1].length() - 1);
    String[] nodeTW = nodeDetails.split(",");
    double threshhold = Double.parseDouble(nodeTW[0]);
    double weight = (nodeTW.length == 2) ? Double.parseDouble(nodeTW[1]) : 0.0;
    newNode = new Node(threshhold);
    newNode.setWeight(weight);
    System.out.println("Connecting new node @ " + level + "\n\tThreshhold: " + threshhold + "\n\tWeight: " + weight);
    if (parent != null) {
     parent.connect(newNode);
    }
    else {
     root = newNode;
    }

    System.out.println("Using subnet: " + subNet);
    createSubLayer(subNet, newNode, level);
   }
  }
 }

And I'm calling it with


createSubLayer(data_file_contents, null, 0);

So far, this works fine for very basic data such as


1.9, 1.0[inputs]

However, the problem seems to be arising when I am splitting by the semi-colon in the first example. For obvious reasons, splitting this first results in (using the first example):


0.5, 0.0 [
    1.5, -1.0 [
        inputs
    ]

and


    0.5, 1.0 [
        inputs
    ]
]

Which is not the intended result.

How may I go about modifying this parsing process (or if need be, modifying the data file structure) to create the desired results? (dont worry about the Node.connect() calls or anything else, as long as I can get the structure correct)

As a fairly straightforward comparison, this structuring is essentially like an XML document, or JSON, or other similar formats, just lacking attribute and node names (since there are always only the two numeric attributes in order, and the node contents).

Thank you for any help!

Upvotes: 1

Views: 1099

Answers (2)

Bart Kiers
Bart Kiers

Reputation: 170158

In case your format is subject to change (and may get a bit more complex), you could consider using a tool like ANTLR as Toader suggested. You then only need to write (or change) your grammar to generate a (new) lexer & parser. Take the following grammar:

grammar Test;

parse
  :  element+ EOF
  ;

element
  :  numberList Open atom (SemiCol atom)* Close
  ;

numberList
  :  Decimal (Comma Decimal)*
  ;

atom
  :  element
  |  Identifier
  ;

Open       : '[';
Close      : ']';
Comma      : ',';
SemiCol    : ';';
Identifier : ('a'..'z' | 'A'..'Z')+;
Decimal    : '0'..'9'+ '.' '0'..'9'+;
Spaces     : (' ' | '\t' | '\r' | '\n') {skip();};

When interpreting your example:

0.5, 0.0 [
 1.5, -1.0 [
  inputs
 ]
 ;
 0.5, 1.0 [
  inputs
 ]
]

ANTLRWorks produces the following parse tree:

alt text

Upvotes: 2

Mihai Toader
Mihai Toader

Reputation: 12243

You need do a proper parsing when you have recursive structures as is this case. I would suggest you to look into http://www.antlr.org/ in order to write easily such a parser.

Another approach is to try to write a manual recursively descent parser (your format suggest to me that it ca be done) but it might seems a little complicated if you don't have an idea how it's done.

The general idea is to have a method named parseNode for example who would look if the next input is either a number or a name like inputs. If it is number will parse numbers until if finds a [ char. After that it will call parseNode recursively. After the parse node returns it would look at the next char and if it is ] it means it parsed all the childs. If not then the char should have been ; and it needs to eat it and call parseNode again. Once it finds ] it will return.

Basically this is how i would do it.

The following code will parse your string properly BUT remember that it has absolutely no error checking for invalid input, no proper parsing of strings into number and so on. It does however shows some working code for what i have suggested above. You should NOT put this code in production :).

import java.util.ArrayList;
import java.util.List;

public class Main {

    static Node input = new Node();

    public static class Node {
        String numbers;
        List<Node> childs;
    }

    static class Input {
        String data;
        int pos;

        Input(String data, int pos) {
            this.data = data;
            this.pos = pos;
        }
    }

    public static void main(String[] args) {
        String data = "0.5, 0.0 [\n" +
                " 1.5, -1.0 [\n" +
                "  inputs\n" +
                " ]\n" +
                " ;\n" +
                " 0.5, 1.0 [\n" +
                "  inputs\n" +
                " ]\n" +
                "]";

        Node node = parseNode(new Input(data, 0));
    }

    private static Node parseNode(Input input) {
        StringBuffer stringBuffer = new StringBuffer();

        // eat chars until '[' or ']' or ';' or end of string
        boolean completed = false;
        char ch = input.data.charAt(input.pos);

        while (!completed && input.pos < input.data.length()) {
            ch = input.data.charAt(input.pos);
            switch (ch) {
                case '[':
                case ']':
                case ';':
                    completed = true;
                    break;
                default:
                    input.pos++;
                    stringBuffer.append(ch);
            }
        }

        String numbers = stringBuffer.toString().trim();

        if ( numbers.equalsIgnoreCase("inputs") ) {
            return Main.input;
        }

        Node thisNode = new Node();

        thisNode.numbers = numbers;
        thisNode.childs = new ArrayList<Node>();

        if ( ch == '[' ) { // we have childs
            do {
                input.pos++;
                thisNode.childs.add(parseNode(input));

                ch = input.data.charAt(input.pos);
                while ( ch != ';' && ch != ']' ) {
                    input.pos++;
                    ch = input.data.charAt(input.pos);
                }
            } while (ch == ';');

            if ( ch == ']' ) {
                input.pos++;
            }
        }

        return thisNode;
    }
}

Upvotes: 3

Related Questions