Andreas Jansson
Andreas Jansson

Reputation: 890

XML structure for fastest lookup

Is one or the other of the following structures faster, when it comes to looking up a certain "resource" in the xml below?

Sample 1.


<root>
 <resource key="res_test_1" value="test"/>
 <resource key="res_test_2" value="test 2"/>
 <resource key="res_test_3" value="test 3"/>
</root>

Sample 2.


<root>
 <res_test_1>test</res_test_1>
 <res_test_2>test 2</res_test_2>
 <res_test_3>test 3</res_test_3>
</root>

The "keys" are always valid XML element names.

I'm asking since this set of resource key / values will be part of an xml file, that will be processed by XSL, replacing certain "keys" in the XML with the values from the resource part of the same XML file... and I would like to structure the resource part as optimal as possible for the lookups that will be needed. I'm using C# and the XslCompiledTransform object for running the transform.

My pure instinct says that the object model might get faster to the keys when they are the actual element names, but I find no advice regarding this kind of question. Perhaps it's unimportant to think about this issue, since the whole xml document will be in memory during the transform.

Edit (adding more info from here and down): As I've already indicated, this question might be theoretical (focusing on a few milliseconds is not relevant), but the reason for entering this question was to get an opinion on exactly what I'm asking - is one way faster than the other (of the two samples laid out), when it comes to locating data in an XML structure. Is one or the other the preferred way, for any reason.

As I see it, the first sample needs to involve more "work" for a processor, for locating and returning the value, when asking for it.

This a sample XPath for Sample 1: /root/resource[@key="res_test_2"]/@value

Corresponding XPath for Sample 2: /root/res_test_2

Also, the structure of sample 2 requires less space, which will improve load time, as indicated by one of the answers below. A good point, at least for very large documents.

When I come to think of it: An obvious downside with sample 2 would be that an XSD schema would not be of much use, since this part of the XML would have dynamic element names.. which might be what the advice to put all values in attributes (se answer below) was about.

I made these XPath samples since they are easy to demonstrate. A similar lookup will be needed in the XSL transform that I wrote about earlier, but the focus of this question should be the structure of the document, as a more generic question.

Thanks, Andreas

Upvotes: 2

Views: 662

Answers (2)

Rookie Programmer Aravind
Rookie Programmer Aravind

Reputation: 12154

Between sample1 and sample2, the only difference is .. you are converting element to attribute .. well reading a child attribute would cost same effort as reading a child element ..

example:

<!--example1-->
<root>
  <child id="something"/>
</root>

<!--example2-->
<root>
  <child>
    <id>somthing</id>
  </child>
</root>

The Xpath for reading "somthing" from first example is /root/child/@id/. and Xpath for reading /root/child/id/. ..

which is not so big deal of difference .. but if you look at the size .. example2 is slightly big .. now assume that you have a huge list of such nodes .. then example2 file would be bulkier than example1 ..
So example2 data is weighing high

coming back to your examples .. If you look at the structure .. sample1 looks more lengthier than sample2 ..
Assume that the same files having huge number of data with respective hierarchy ..
if you try to read sample1 and sample2 using C# code .. the code would take more time to load sample1 (due to its size) .. compared to that processing speed (I mean the process of reading nodes) would be ignorable.

@OP, As you have know ..

XPath for Sample 1: /root/resource[@key="res_test_2"]/@value

Corresponding XPath for Sample 2: /root/res_test_2

Sample1 certainly goes 1 level down .. compared to Sample2 .. but as I mentioned earlier .. I have observed that this won't make much difference to the parser, I have already explained about effects of size over reading of file .. There is something I would like to let you know.
Using attributes should be wise choice, its not a rule but we usually use attributes as metadata ..
Example:

<root>
  <child id="1">some Data</child>
  <child id="2">Some other Data</child>
</root>

If you look at the above sample XML, attribute- viz, "ID" is used as metadata about data of child node, Id isn't a data it is just a sub-message.


Take another example:

<html>
   <body>
       <div class="style1">Here is the display text.</div>
   </body>
</html>

Above example is nothing but an HTML code :) Where attribute Class is having a value "style1" .. this class name is then used in CSS file to add property and styles to the text under tag

Upvotes: 0

Luixv
Luixv

Reputation: 8710

A short while ago I've asked something about XSLT performance and I got the following answer:

Using attributes instead of elements improves the performance. When performing XPath matches, attributes are faster because they are loosely typed. This makes validation of the schema easier.

(See this question)

Upvotes: 1

Related Questions