Tom
Tom

Reputation: 577

Improve performance of Bitconverter.ToInt16

I am collecting data from a USB device and this data has to go to an audio output component. At the moment I am not delivering the data fast enough to avoid clicks in the output signal. So every millisecond counts.

At the moment I am collecting the data which is delivered in a byte array of 65536 bytes.The first two bytes represent 16 bits of data in little endian format. These two bytes must be placed in the first element of a double array. The second two bytes, must be placed in the first element of a different double array. This is then repeated for all the bytes in the 65536 buffer so that you end up with 2 double[] arrays of size 16384.

I am currently using BitConverter.ToInt16 as shown in the code. It takes around 0.3ms to run this but it has to be done 10 times to get a packet to send off to the audio output. So the overhead is 3ms which is just enough for some packets to not be delivered on time eventually.

Code

byte[] buffer = new byte[65536];
double[] bufferA = new double[16384];
double[] bufferB = new double[16384]

for(int i= 0; i < 65536; i +=4)
{
    bufferA[i/4] = BitConverter.ToInt16(buffer, i);
    bufferB[i/4] = BitConverter.ToInt16(buffer, i+2);
}

How can I improve this? Is it possible to copy the values with unsafe code? I have no experience in that. Thanks

Upvotes: 6

Views: 931

Answers (2)

TheGeneral
TheGeneral

Reputation: 81563

This gets me about triple the speed in release, using Pointers and unsafe. There maybe other micro-optimisations, however I'll leave those details up to the masses

Updated

My original algorithm had a bug, and could have been improved

Modified Code

public unsafe (double[], double[]) Test2(byte[] input, int scale)
{
   var bufferA = new double[input.Length / 4];
   var bufferB = new double[input.Length / 4];

   fixed (byte* pSource = input)
      fixed (double* pBufferA = bufferA, pBufferB = bufferB)
      {
         var pLen = pSource + input.Length;
         double* pA = pBufferA, pB = pBufferB;

         for (var pS = pSource; pS < pLen; pS += 4, pA++, pB++)
         {
            *pA = *(short*)pS;
            *pB = *(short*)(pS + 2);
         }
      }

   return (bufferA, bufferB);
}

Benchmarks

Each test is run 1000 times, garbage collected before each run, and scaled to various array lengths. All results are checked against the original OP version

Test Environment

----------------------------------------------------------------------------
Mode             : Release (64Bit)
Test Framework   : .NET Framework 4.7.1 (CLR 4.0.30319.42000)
----------------------------------------------------------------------------
Operating System : Microsoft Windows 10 Pro
Version          : 10.0.17134
----------------------------------------------------------------------------
CPU Name         : Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
Description      : Intel64 Family 6 Model 58 Stepping 9
Cores (Threads)  : 4 (8)      : Architecture  : x64
Clock Speed      : 3901 MHz   : Bus Speed     : 100 MHz
L2Cache          : 1 MB       : L3Cache       : 8 MB
----------------------------------------------------------------------------

Results

--- Random Set of byte ------------------------------------------------------
| Value    |    Average |    Fastest |    Cycles | Garbage | Test |    Gain |
--- Scale 16,384 -------------------------------------------- Time 13.727 ---
| Unsafe   |  19.487 µs |  14.029 µs |  71.479 K | 0.000 B | Pass | 59.02 % |
| Original |  47.556 µs |  34.781 µs | 169.580 K | 0.000 B | Base |  0.00 % |
--- Scale 32,768 -------------------------------------------- Time 14.809 ---
| Unsafe   |  40.398 µs |  31.274 µs | 145.024 K | 0.000 B | Pass | 56.62 % |
| Original |  93.127 µs |  79.501 µs | 329.320 K | 0.000 B | Base |  0.00 % |
--- Scale 65,536 -------------------------------------------- Time 18.984 ---
| Unsafe   |  68.318 µs |  43.550 µs | 245.083 K | 0.000 B | Pass | 68.34 % |
| Original | 215.758 µs | 160.171 µs | 758.955 K | 0.000 B | Base |  0.00 % |
--- Scale 131,072 ------------------------------------------- Time 22.620 ---
| Unsafe   | 120.764 µs |  79.208 µs | 428.626 K | 0.000 B | Pass | 71.24 % |
| Original | 419.889 µs | 322.388 µs |   1.461 M | 0.000 B | Base |  0.00 % |
-----------------------------------------------------------------------------

Upvotes: 4

Christopher
Christopher

Reputation: 9824

"So every millisecond counts." If that is the case, you are dealing with Realtime Programming here. And for all it's power, the .NET Runtime is not ideal for Realtime Programming.

Garbage Collection Memory Management alone is usually a disqualifier for Realtime Programming.

Now you can change .NET from GC memory management to direct management. And squeeze a bit of performance out by going to unsafe code and using naked pointers. But that is pretty much the point where you removed every selling point of .NET. And it would have been better two write the whole thing/that part in native C++ to begin with.

Upvotes: -3

Related Questions