Reputation: 3865
i am looking at building a simple xml parser with c99, i want to implement every single detail to it just for learning purposes, from my understanding the best way is implementing a tree structure and tokenizing the xml string into a tree structure so it will look something like
and i will have 2 simple structs one that represents a node and one that represents an attribute, how bad is the above design?
any suggestions for improvement?
Upvotes: 4
Views: 1690
Reputation: 163342
In looking at the design of the tree, it's worth writing down and prioritising your objectives, because they help you make decisions about trade-offs. I think there are probably three key metrics: time for building the tree, time for navigating the tree (typically in top-down recursive descent), and space occupancy. Plus development effort, of course. The other important factor is whether you want the tree to be mutable (e.g. as required by DOM) or immutable (e.g. for XPath/XSLT/XQuery).
Other factors specific to XML: how much information do you want to retain in the tree? E.g. do you want to retain CDATA section boundaries? And entity references? Or do you want to expand these inline?
I don't know what c99 is, that may impose additional constraints or provide opportunities.
Upvotes: 1
Reputation: 29126
The complexity of your chosen task aside, your data structure looks good at first sight, but in my opinion there are two or three things wrong:
So you really need a binary tree for the xml structure itself and a linked list of attributes for each node. For example, consider this simple xml-style data:
<dinner time="19:00" dresscode="informal">
<course id="starter">
<food>Consomme</food>
<food>Tomato soup</food>
<course>
<course id="salad" optional=optional>
<food>Green salad</food>
<course>
<course id="main">
<food>Steak and kidney pie</food>
<food type=vegetarian>Spinach lasagna</food>
<course>
<course id="dessert">
<food>Fruit</food>
<food>Ice cream</food>
<food>Coffee</food>
<course>
</dinner>
The food
items are the children of the course
s, but are siblings of each other if they have the same course
as parent. The tree structure looks like the indentation: Items on the same level are siblings, indented items are children.
You need only keep a pointer to the oldest child, other children are reachable via the sibling relationship, which is also a pointer. (In binary-tree nomenclature, children are the left
links and siblings are the right
links.) For easy traversing you should also keep a pointer to the parent.
The textual content and the attributes are just data attached to the nodes.
(Of course, looking at the source of existing XML parsers might give you better ideas.)
Upvotes: 2