Raheel Khan
Raheel Khan

Reputation: 14787

Minimize the size of XML files

I have a client/server application where data is exchanged in XML format. The size of data comes to around 50MB, most of which comprises of the XML tags themselves. Is there a way to take the generated XML and index the node names as follows:

<User><Assessments><Assessment ID="1" Name="some name" /></Assessments></User>

to:

<A><B><C ID="1" Name="some name" /></B></A>

This would save an incredible amount of bloat.

EDIT
This data is serialized from Entity Framework objects. The reason for choosing XML as the protocol was intrinsic support in .NET and smart code generation of FromXml and ToXml for entities to circumvent circular references.

Upvotes: 0

Views: 3828

Answers (5)

Raheel Khan
Raheel Khan

Reputation: 14787

I ended up writing a small class that renames the node names and creates a mapping element so the process can be reversed as well. That alone took the file size down from 50MB to 10MB.

Compressing the file would be the next step but I wonder how much space I could ave using Binary serialization. Have not tried that before.

Upvotes: 0

Asif Mushtaq
Asif Mushtaq

Reputation: 13150

Alternatively, you can also consider json instead of xml, which would take less size as compared to xml

Upvotes: 0

nyxthulhu
nyxthulhu

Reputation: 9752

The point of XML is so that you don't need to compress/minimise the data. If you need to minimise what's going down the wire then there's a good chance your using the wrong protocol.

Obviously you can pass this through a gzip stream, which will get you a massive advantage, but if you want to squeeze even more out of it than that then it may be worth looking at JSON or even a binary format.

XML was designed to be readable by humans, and by removing the readability then your essentially removing one of the major reasons to use XML in the first place.

Upvotes: 1

ChrisF
ChrisF

Reputation: 137128

You could look at using Attributes for your data rather than Elements. For example, if you have "gender" as an attribute you will get:

<person gender="female">
  <firstname>Anna</firstname>
  <lastname>Smith</lastname>
</person>

whereas if it is an Element you will get:

<person>
  <gender>female</gender>
  <firstname>Anna</firstname>
  <lastname>Smith</lastname>
</person>

It's not strictly correct but will achieve what you are after.

Upvotes: 1

mathieu
mathieu

Reputation: 31192

What about just compressing/decompressing your data stream between the client and the server ? This will be easier to implement and much less error prone than to do some custom transformation on the xml data.

Upvotes: 4

Related Questions