sdasdadas
sdasdadas

Reputation: 25096

What is the best way to parse a non-flat file format in Java?

I am attempting to parse a nested file format in Java.

The file format looks like this:

head [

    A [
        property value
        property2 value
        property3 [
            ... down the rabbit hole ...
        ]
    ]

    ... more As ...

    B [
        .. just the same as A
    ]

    ... more Bs ...
]

What is the best/easiest technique to parse this into my program?

As a side note, I have no control over the format - because I knew someone would say it!

Upvotes: 1

Views: 363

Answers (4)

Joni
Joni

Reputation: 111239

Arguably the easiest way to parse files of these kinds is using a recursive descent parser (http://en.m.wikipedia.org/wiki/Recursive_descent_parser). I guess this is what you mean by manually reading and keeping track of the structure you have found.

A finite state machine wouldn't work if you have to be able to deal with unlimited nesting. If there are only two levels it could be enough.

Writing a grammar and generating a parser would also work, but if you haven't done that before or don't have the time to learn how to use the tools it's probably overkill...

Upvotes: 1

Dale Wilson
Dale Wilson

Reputation: 9434

I second the recommendation to take a look at Antlr. StAX adds SAX-like event handling.

http://www.antlr.org/wiki/display/ANTLR3/Interfacing+StAX+to+ANTLR

Yes there is a learning curve, but by the time you handled all the odd cases and debugged your code, you'd probably break even -- pluse you'd have a new item on your resume.

Upvotes: 1

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726539

If the grammar is indeed nested like this, writing a very simple top-down parser would be a trivial task: you have very few tokens to recognize, and the nested structure repeats itself very conveniently for a textbook recursive-descent parser.

I would not even bother with ANTLR or another parser generator for something this simple, because the learning curve would eat the potential benefits for the project* .


* Potential benefits for you from learning a parser generator are hard to overestimate: if you can spend a day or two learning to build parsers with ANTLR, your view of structured text files will change forever.

Upvotes: 2

Peter Lawrey
Peter Lawrey

Reputation: 533492

The fastest approach is to use a format like this already e.g. JSon or YAML. These formats do this and are supported.

As a side note, I have no control over the format

If you want to know the best way to parse something like Yaml, but not, is to read the code for a simple Yaml parser.

Just parsing the file is unlikely to be enough, you will also want to trigger events or generate a data model from the data you load.

Upvotes: 0

Related Questions