Reputation: 25096
I am attempting to parse a nested file format in Java.
The file format looks like this:
head [
A [
property value
property2 value
property3 [
... down the rabbit hole ...
]
]
... more As ...
B [
.. just the same as A
]
... more Bs ...
]
What is the best/easiest technique to parse this into my program?
Finite State Machine?
Manually read it word by word and keep track of what part of the structure I am in?
Write a grammar...?
As a side note, I have no control over the format - because I knew someone would say it!
Upvotes: 1
Views: 363
Reputation: 111239
Arguably the easiest way to parse files of these kinds is using a recursive descent parser (http://en.m.wikipedia.org/wiki/Recursive_descent_parser). I guess this is what you mean by manually reading and keeping track of the structure you have found.
A finite state machine wouldn't work if you have to be able to deal with unlimited nesting. If there are only two levels it could be enough.
Writing a grammar and generating a parser would also work, but if you haven't done that before or don't have the time to learn how to use the tools it's probably overkill...
Upvotes: 1
Reputation: 9434
I second the recommendation to take a look at Antlr. StAX adds SAX-like event handling.
http://www.antlr.org/wiki/display/ANTLR3/Interfacing+StAX+to+ANTLR
Yes there is a learning curve, but by the time you handled all the odd cases and debugged your code, you'd probably break even -- pluse you'd have a new item on your resume.
Upvotes: 1
Reputation: 726539
If the grammar is indeed nested like this, writing a very simple top-down parser would be a trivial task: you have very few tokens to recognize, and the nested structure repeats itself very conveniently for a textbook recursive-descent parser.
I would not even bother with ANTLR or another parser generator for something this simple, because the learning curve would eat the potential benefits for the project* .
Upvotes: 2
Reputation: 533492
The fastest approach is to use a format like this already e.g. JSon or YAML. These formats do this and are supported.
As a side note, I have no control over the format
If you want to know the best way to parse something like Yaml, but not, is to read the code for a simple Yaml parser.
Just parsing the file is unlikely to be enough, you will also want to trigger events or generate a data model from the data you load.
Upvotes: 0