Armin
Armin

Reputation: 1395

How do I just parse a DTD file in Java, one line at the a time and without any validation

I have been given an invalid DTD file that has duplicate elements and the elements are not identical:

<!ELEMENT Data (Name, address?)>
<!ELEMENT Data (Name, age)>

And I need to write a utility which reads the DTD and merges the elements like the following:

<!ELEMENT Data (Name, address?, age)>

I can't seem to be able to find a java library which allows me to do just parsing one element at a time (like SAX).

What I am really after is to read the <!ELEMENT Data (Name, address?)> into a data structure like a Map of arrays or something similar.

Any pointers will be much appreciated.

Upvotes: 1

Views: 807

Answers (1)

Ira Baxter
Ira Baxter

Reputation: 95362

Seems to me you have to read all the DTD ELEMENTs at once, or you can't pair them up as you have shown in your example.

Because DTD descriptions can have arbitrary nesting of (...) regular expressions can't help you in theory. As a practical matter, most DTDs ELEMENTS have only one or two layers of (...) and so they might work. If your problem largely looks like you have shown, you can do this with just string hacking and hand fix the rest. (Reading single lines won't cut it; ELEMENT descriptions can cross multiple lines and end with "...>" and you'll have to find that).

If you want a reliable automated approach, you need what amounts to a a program transformation system. DTD's are a particular type of formal system; you need a tool that can read instances of the formal description, give you access to read and update the data structures that represent the instance (typically calls Abstract Syntax Trees), and rewrite the results back out as valid source text.

Not in Java, but our DMS Software Reengineering Toolkit is such a program transformation engine. It has an XML front end that is capable of parsing DTDs, and in fact we've build code generators using those DTDs.

Upvotes: 1

Related Questions