Reputation: 43
Can anyone direct me to a good tutorial in building an XML parser? I realize most languages already have libraries to do this task, but I'm interested in learning about the grammar of XML and the theory behind how parsers work. I've tried searching for something that explains this but have been unable to find anything.
Upvotes: 3
Views: 2513
Reputation: 55833
Just to make it clear, you should NEVER EVER try to write your own XML parser from scratch to be used in production. This is
XML (and its related languages) are rather complex and contain a lot of potential implementation pitfalls which can lead to security issues. Thus, to get a grasp of the languages, you can toy with writing a parser of course but you should generally not attempt to use this homegrown parser for anything serious.
For getting an overview of XML, I propose you read "XML In A Nutshell" on O'Reilly and just try to do stuff with XML and XML transformations. For general parser building, Parsing Techniques looks really promising. But actually parsing XML is rather hard, so you should probably start by getting knowledge by using it. Also documentation is much less sparse in that area...
Upvotes: 2
Reputation: 116582
I think there isn't enough demand for people to write such tutorials; and as I commented, I don't think general parser techniques are of much help. XML parsers are not something usual lex+yacc approach works too well (lexer part more than parser, for what that's worth).
I know most production ready XML parsers are beasts, but you might be best off starting reading one. Java has a few examples, and xmlpull might be amongst simplest proper parsers. Woodstox and Xerces are the most compliant ("full") parsers, with large codebase, so definitely not light reading. But they handle everything XML parser should, so they might be educational too. But beware half-backed fake parsers that skip checks for things XML specification mandates (Javolution for example checks very few things, for example none of character validity checks, or attribute name duplications).
Another thing to read is obviously XML specification. It is one of most well-written specifications IMO; accurate and complete, even if not exactly light reading. But considering all it covers, it's actually not all that long.
Upvotes: 0
Reputation: 163458
If you're a student of computer science and fancy writing an XML parser as an academic exercise, then fine: it's a good way to spend a wet weekend, and you don't need to ask the question because you have access to a library of textbooks on how to write parsers, and if you have specific XML-related problems then you can always look into the code of various open-source parsers to see how experts have tackled the problem.
If you're not a student of computer science then I would suggest you become one - the theory of how to write parsers for different classes of grammar is part of the foundation of the subject.
Upvotes: 0