Reputation: 4913
I've been toying around with some .NET features (namely Pipelines, Memory, and Array Pools) for high speed file reading/parsing. I came across something interesting while playing around with Array.Copy
, Buffer.BlockCopy
and ReadOnlySequence.CopyTo
. The IO Pipeline reads data as byte
and I'm attempting to efficiently turn it into char
.
While playing around with Array.Copy
I found that I am able to copy from byte[]
to char[]
and the compiler (and runtime) are more than happy to do it.
char[] outputBuffer = ArrayPool<char>.Shared.Rent(inputBuffer.Length);
Array.Copy(buffer, 0, outputBuffer, 0, buffer.Length);
This code runs as expected, though I'm sure there are some UTF edge cases not properly handled here.
My curiosity comes with Buffer.BlockCopy
char[] outputBuffer = ArrayPool<char>.Shared.Rent(inputBuffer.Length);
Buffer.BlockCopy(buffer, 0, outputBuffer, 0, buffer.Length);
The resulting contents of outputBuffer
are garbage. For example, with the example contents of buffer
as
{ 50, 48, 49, 56, 45 }
The contents of outputBuffer
after the copy is
{ 12338, 14385, 12333, 11575, 14385 }
I'm just curious what is happening "under the hood" inside the CLR that is causing these 2 commands to output such different results.
Upvotes: 6
Views: 1924
Reputation: 941942
Array.Copy() is smarter about the element type. It will try to use the memmove() CRT function when it can. But will fall back to a loop that copies each element when it can't. Converting them as necessary, it considers boxing and primitive type conversions. So one element in the source array will become one element in the destination array.
Buffer.BlockCopy() skips all that and blasts with memmove(). No conversions are considered. Which is why it can be slightly faster. And easier to mislead you about the array content. Do note that utf8 encoded character data is visible in that array, 12338 == 0x3032 = "2 ", 14385 = 0x3831 = "18", etc. Easier to see with Debug > Windows > Memory > Memory 1.
Noteworthy perhaps is that this type-coercion is a feature. Say when you receive an int[] through a socket or pipe but have the data in a byte[] buffer. By far the fastest way to do it.
Upvotes: 15