Reputation: 787
I'm trying to write byte[] into xml as hex. Like:
new byte[] { 1, 2, 3, 10 } => "0102030A"
I see good posts about conversion, but didn't find a good way to write chars into xml one by one since xmlwriter does not have WriteChar method or WriteRaw with single char override. (Like in TextWriter)
Here's what I'm doing atm:
const string HexChars = "0123456789ABCDEF";
public static void WriteHex(this XmlWriter writer, byte[] bytes)
{
unchecked
{
for (int i = 0; i < bytes.Length; i++)
{
var b = bytes[i];
writer.WriteRaw(HexChars[b >> 4].ToString());
writer.WriteRaw(HexChars[b & 15].ToString());
}
}
}
I don't want to instantiate new array with double size of the byte[] and then write it to xml. WriteBinHex methods adds hypens between values, that's why I didn't use it. I see that base stream is exposed with a property, but I guess it is a bad idea to use it. What I try to achive is doing this with more "streamy" way.
So my question is, what is the fastest way to write single char into xml?
Currently thinking of using smaller char[] buffer to do the writing in loops, if I can't find a better way.
EDIT:
Sorry, I was wrong about WriteBinHex, which has exactly the same output with what I was looking for. I'm adding some benchmarks as answer, so maybe it can help somoeone else.
Upvotes: 0
Views: 294
Reputation: 787
I tried 5 methods and here are benchmarks.
First of all, code is release compiled, stopwatch is used, 4 different length of arrays are measured. GC is collected before each measure. Iteration counts are different for each length to show similar time values (e.g.: byte[16] is iterated 100K times, byte[128K] iterated 40 times). Each iteration creates a xml writer, writes same byte[] as 10 elements in it.
All methods are compared against below method, which is XmlWriter's WriteBinHex:
writer.WriteBinHex(bytes, 0, bytes.Length);
All below methods are running within unchecked block (e.g. unchecked { ... })
Method-1: Full Char[]
var result = new char[bytes.Length * 2];
byte b;
for (int i = 0; i < bytes.Length; i++)
{
b = bytes[i];
result[i * 2] = HexChars[b >> 4];
result[i * 2 + 1] = HexChars[b & 15];
}
writer.WriteRaw(result, 0, result.Length);
Method-2: Buffer
var bufferIndex = 0;
var bufferLength = bytes.Length < 2048 ? bytes.Length * 2 : 4096;
var buffer = new char[bufferLength];
for (int i = 0; i < bytes.Length; i++)
{
var b = bytes[i];
buffer[bufferIndex] = HexChars[b >> 4];
buffer[bufferIndex + 1] = HexChars[b & 15];
bufferIndex += 2;
if (bufferIndex.Equals(bufferLength))
{
writer.WriteRaw(buffer, 0, bufferLength);
bufferIndex = 0;
}
}
if (bufferIndex > 0)
writer.WriteRaw(buffer, 0, bufferIndex);
Method-3: RawCharByChar
for (int i = 0; i < bytes.Length; i++)
{
var b = bytes[i];
writer.WriteRaw(HexChars[b >> 4].ToString());
writer.WriteRaw(HexChars[b & 15].ToString());
}
Method-4: StringFormatX2
for (int i = 0; i < bytes.Length; i++)
writer.WriteRaw(bytes[i].ToString("x2"));
Results: (Length vs Time in ms)
Method: BinHex
16 bytes: 971 ms, 1 Kb: 800 ms, 128 Kb: 906 ms, 2Mb: 1291 ms
Method: Full Char[]
16 bytes: 828 ms, 1 Kb: 612 ms, 128 Kb: 780 ms, 2 Mb: 1112 ms
AVG: -16%
Method: Buffer
16 bytes: 834 ms, 1 Kb: 671 ms, 128 Kb: 712 ms, 2 Mb: 1059 ms
AVG: -17%
Method: RawCharByChar
16 bytes: 2624 ms, 1 Kb: 6515 ms, 128 Kb: 6979 ms, 2 Mb: 8282 ms
AVG: +524%
Method: StringFormatX2
16 bytes: 3706 ms, 1 Kb: 10025 ms, 128 Kb: 10490 ms, 2 Mb: 26562 ms
AVG: +1113%
I will go on with Buffer implementation in this case, which is 17% faster than WriteBinHex.
EDIT:
With thread static marked buffer field (compared to WriteBinHex method)
16 Byte: -3%, 1 Kbyte: -10%, 128 Kbyte: -14%, 2 Mb: -11%
Average: -9% Which was -17% with normal buffer so I'm giving up on ThreadLocal/Static. Also tried with 128 / 256 char buffers, got similar results.
[ThreadStatic]
static char[] _threadStaticBuffer = new char[240];
private void Test(XmlWriter writer, byte[] bytes)
{
var bufferIndex = 0;
var bufferLength = bytes.Length < 120? bytes.Length * 2 : 240;
var buffer = _threadStaticBuffer;
for (int i = 0; i < bytes.Length; i++)
{
var b = bytes[i];
buffer[bufferIndex] = HexChars[b >> 4];
buffer[bufferIndex + 1] = HexChars[b & 15];
bufferIndex += 2;
if (bufferIndex.Equals(bufferLength))
{
writer.WriteRaw(buffer, 0, bufferLength);
bufferIndex = 0;
}
}
if (bufferIndex > 0)
writer.WriteRaw(buffer, 0, bufferIndex);
}
EDIT-2:
After I read some posts, I benchmarked my Method-2 with method mentoined in https://stackoverflow.com/a/624379/2266524, where instead of 16 char lookup, 256 * uint lookup is used.
Here are the results compared to WriteBinHex method:
Method: WriteBinHex
16 bytes: 745, 1 Kb: 679, 128 Kb: 739, 2 Mb: 1038
Method: Buffered char[] 256 uint lookup
16 bytes: 653, 1 Kb: 454, 128 Kb: 502, 2 Mb: 758
AVG: -26%
Method: Buffered char[] unsafe 256 uint lookup
16 bytes: 645, 1 Kb: 371, 128 Kb: 424, 2 Mb: 663
AVG: -34%
The code:
Method-5: Buffer with 256 uint lookup
private static readonly uint[] _hexConversionLookup = CreateHexConversionLookup();
private static uint[] CreateHexConversionLookup()
{
var result = new uint[256];
for (int i = 0; i < 256; i++)
{
string s = i.ToString("X2");
result[i] = ((uint)s[0]) + ((uint)s[1] << 16);
}
return result;
}
private void TestBufferWith256UintLookup(XmlWriter writer, byte[] bytes)
{
unchecked
{
var bufferIndex = 0;
var bufferLength = bytes.Length < 2048 ? bytes.Length * 2 : 4096;
var buffer = new char[bufferLength];
for (int i = 0; i < bytes.Length; i++)
{
var b = _hexConversionLookup[bytes[i]];
buffer[bufferIndex] = (char)b;
buffer[bufferIndex + 1] = (char)(b >> 16);
bufferIndex += 2;
if (bufferIndex == bufferLength)
{
writer.WriteRaw(buffer, 0, bufferLength);
bufferIndex = 0;
}
}
if (bufferIndex > 0)
writer.WriteRaw(buffer, 0, bufferIndex);
}
}
Method-6: Unsafe buffer with 256 uint lookup
private static readonly uint[] _hexConversionLookup = CreateHexConversionLookup();
private static uint[] CreateHexConversionLookup()
{
var result = new uint[256];
for (int i = 0; i < 256; i++)
{
string s = i.ToString("X2");
result[i] = ((uint)s[0]) + ((uint)s[1] << 16);
}
return result;
}
private unsafe static readonly uint* _byteHexCharsP = (uint*)GCHandle.Alloc(_hexConversionLookup, GCHandleType.Pinned).AddrOfPinnedObject();
private unsafe void TestBufferWith256UintLookupUnsafe(XmlWriter writer, byte[] bytes)
{
fixed (byte* bytesP = bytes)
{
var bufferIndex = 0;
var bufferLength = bytes.Length < 2048 ? bytes.Length : 2048;
var charBuffer = new char[bufferLength * 2];
fixed (char* bufferP = charBuffer)
{
uint* buffer = (uint*)bufferP;
for (int i = 0; i < bytes.Length; i++)
{
buffer[bufferIndex] = _byteHexCharsP[bytesP[i]];
bufferIndex++;
if (bufferIndex == bufferLength)
{
writer.WriteRaw(charBuffer, 0, bufferLength * 2);
bufferIndex = 0;
}
}
}
if (bufferIndex > 0)
writer.WriteRaw(charBuffer, 0, bufferIndex * 2);
}
}
My choice is #6, but you may prefer #5 for safe version. I appreciate any comments to make it faster, thanks..
Upvotes: -1
Reputation: 171178
Since you want to write chars individually, WriteRaw seems to be the fastest way. Especially, since you already excluded WriteValue.
You can optimize away this HexChars[b >> 4].ToString()
expression by precalculating the strings.
If I was you I would use a method that writes entire strings so that the chars do not have to pass through the entire processing and call tree individually. That could provide like 10x speedup when I see what these methods do using Reflector. You said that you are not considering this approach, though.
In Reflector I see that WriteRaw also does quite a lot of stuff. I think this needs to be benchmarked.
If you don't like the temporary char[]
or byte[]
allocations you can use a [ThreadStatic]
temporary buffer for that. The buffer size probably should be in the range 16-256. Big enough to diminish all constant overheads and small enough to fit into the L1 cache and not pollute that cache too much.
Upvotes: 3