Reputation: 30077
We are beginning to roll out more and more WAN deployments of our product (.NET fat client with an IIS hosted Remoting backend). Because of this we are trying to reduce the size of the data on the wire.
We have overridden the default serialization by implementing ISerializable (similar to this), and we are seeing anywhere from 12% to 50% gains. Most of our efforts focus on optimizing arrays of primitive types. Is there a fancy way of serializing primitive types, beyond the obvious?
For example, today we serialize an array of ints as follows:
[4-bytes (array length)][4-bytes][4-bytes]
Can anyone do significantly better?
The most obvious example of a significant improvement, for boolean arrays, is putting 8 bools in each byte, which we already do.
Note: Saving 7 bits per bool may seem like a waste of time, but when you are dealing with large magnitudes of data (which we are), it adds up very fast.
Note: We want to avoid general compression algorithms because of the latency associated with it. Remoting only supports buffered requests/responses (no chunked encoding). I realize there is a fine line between compression and optimal serialization, but our tests indicate we can afford very specific serialization optimizations at very little cost in latency. Whereas reprocessing the entire buffered response into new compressed buffer is too expensive.
Upvotes: 6
Views: 3640
Reputation: 416039
Before implementing ISerializable yourself, you were probably using XmlSerializer or the SOAP formatter in a web service. Given you have all fat clients all running .NET, you could try using the BinaryFormatter.
Upvotes: -1
Reputation: 31593
Yes, there is a fancy way of serialising primitive types. As a bonus it is also much faster (typically 20-40 times).
Simon Hewitt's open source library, see Optimizing Serialization in .NET - part 2, uses various tricks. For example, if it is known that an array contains small integers then less is going to the serialised output. This is described in detail in part 1 of the article. For example:
...So, an Int32 that is less than 128 can be stored in a single byte (by using 7-bit encoding) ....
The full and the size optimised integers can be mixed and matched. This may seem obvious, but there are other things; for example, special things happen for integer value 0 - optimisation to store numeric type and a zero value.
Part 1 states:
... If you've ever used .NET remoting for large amounts of data, you will have found that there are problems with scalability. For small amounts of data, it works well enough, but larger amounts take a lot of CPU and memory, generate massive amounts of data for transmission, and can fail with Out Of Memory exceptions. There is also a big problem with the time taken to actually perform the serialisation - large amounts of data can make it unfeasible for use in apps ....
I have used this library with great success in my application.
To make sure .NET serialisation is never used put an
ASSERT 0
, Debug.WriteLine()
or similar into the place in the
library code where it falls back on .NET serialisation.
That's at the end of function WriteObject()
in file
FastSerializer.cs
, near createBinaryFormatter().Serialize(BaseStream, value);
.
Upvotes: 2
Reputation: 41688
If you want to control the serialization format yourself, with just library help for compact integer storage, derive a class from BinaryWriter that uses Write7BitEncodedInt. Do likewise for BinaryReader.Read7BitEncodedInt.
Upvotes: 0
Reputation: 1063328
(relates to messages/classes, not just primitives)
Google designed "protocol buffers" for this type of scenario (they shift a huge amount of data around) - their format is compact (using things like base-128 encoding) but extensible and version tolerant (so clients and servers can upgrade easily).
In the .NET world, I can recommend 2 protocol buffers implementations:
For info, protobuf-net has direct support for ISerializable
and remoting (it is part of the unit tests). There are performance/size metrics here.
And best of all, all you do is add a few attributes to your classes.
Caveat: it doesn't claim to be the theoretical best - but pragmatic and easy to get right - a compromise between performance, portability and simplicity.
Upvotes: 5
Reputation: 8153
Here's a trick I used once for encoding arrays of integers:
For example, to represent unsigned integers 0x0000017B, 0x000000A9, 0xC247E8AD and 0x00032A64, you would write (assuming little-endian): B1, 7B, 01, A9, AD, E8, 47, C2, 64, 2A, 03.
It can save you up to 68.75% (11/16) of space in the best case. In the worst case, you would actually waste additional 6.25% (1/16).
Upvotes: 1
Reputation: 95355
If your arrays can be sorted you can perform a simple RLE to save space. Even if they aren't sorted RLE can still be beneficial. It is fast to implement for both writing and reading.
Upvotes: 1
Reputation: 28606
For integer, if you usually have small numbers (under 127 or 32768) you can encode the number using the MSB as a flag to determine if it's the last byte or not. A little bit similar to UTF-8 but the flag bit is actually wasted (which is not the case with UTF-8)
Example (big-endian):
125 which is usually encoded as 00 00 00 7D
Could be encoded as 7D
270 which is usually encoded as 00 00 01 0E
Could be encoded as 82 0E
The main limitation is that the effective range of a 32 bit value is reduced to 28 bits. But for small values you will usually gain a lot.
This method is actually used in old formats such as MIDI because old electronics needed very efficient and simple encoding techniques.
Upvotes: 0
Reputation: 56123
If you know which int values are more common, you can encode those values in fewer bits (and encode the less-common values using correspondingly more bits): this is called "Huffman" coding/encoding/compression.
In general though I'd suggest that one of the easiest things you could do would be to run a standard 'zip' or 'compression' utility over your data.
Upvotes: 0
Reputation: 136697
Check out the base-128 varint type used in Google's protocol buffers; that might be what you're looking for.
(There are a number of .NET implementations of protocol buffers available if you search the web which, depending on their license, you might be able to grovel some code from!)
Upvotes: 2