Reputation: 161
We are looking to migrate a performance critical application to .Net and find that the c# version is 30% to 100% slower than the Win32/C depending on the processor (difference more marked on mobile T7200 processor). I have a very simple sample of code that demonstrates this. For brevity I shall just show the C version - the c# is a direct translation:
#include "stdafx.h"
#include "Windows.h"
int array1[100000];
int array2[100000];
int Test();
int main(int argc, char* argv[])
{
int res = Test();
return 0;
}
int Test()
{
int calc,i,k;
calc = 0;
for (i = 0; i < 50000; i++) array1[i] = i + 2;
for (i = 0; i < 50000; i++) array2[i] = 2 * i - 2;
for (i = 0; i < 50000; i++)
{
for (k = 0; k < 50000; k++)
{
if (array1[i] == array2[k]) calc = calc - array2[i] + array1[k];
else calc = calc + array1[i] - array2[k];
}
}
return calc;
}
If we look at the disassembly in Win32 for the 'else' we have:
35: else calc = calc + array1[i] - array2[k];
004011A0 jmp Test+0FCh (004011bc)
004011A2 mov eax,dword ptr [ebp-8]
004011A5 mov ecx,dword ptr [ebp-4]
004011A8 add ecx,dword ptr [eax*4+48DA70h]
004011AF mov edx,dword ptr [ebp-0Ch]
004011B2 sub ecx,dword ptr [edx*4+42BFF0h]
004011B9 mov dword ptr [ebp-4],ecx
(this is in debug but bear with me)
The disassembly for the optimised c# version using the CLR debugger on the optimised exe:
else calc = calc + pev_tmp[i] - gat_tmp[k];
000000a7 mov eax,dword ptr [ebp-4]
000000aa mov edx,dword ptr [ebp-8]
000000ad mov ecx,dword ptr [ebp-10h]
000000b0 mov ecx,dword ptr [ecx]
000000b2 cmp edx,dword ptr [ecx+4]
000000b5 jb 000000BC
000000b7 call 792BC16C
000000bc add eax,dword ptr [ecx+edx*4+8]
000000c0 mov edx,dword ptr [ebp-0Ch]
000000c3 mov ecx,dword ptr [ebp-14h]
000000c6 mov ecx,dword ptr [ecx]
000000c8 cmp edx,dword ptr [ecx+4]
000000cb jb 000000D2
000000cd call 792BC16C
000000d2 sub eax,dword ptr [ecx+edx*4+8]
000000d6 mov dword ptr [ebp-4],eax
Many more instructions, presumably the cause of the performance difference.
So 3 questions really:
Am I looking at the correct disassembly for the 2 programs or are the tools misleading me?
If the difference in the number of generated instructions is not the cause of the difference what is?
What can we possibly do about it other than keep all our performance critical code in a native DLL.
Thanks in advance Steve
PS I did receive an invite recently to a joint MS/Intel seminar entitled something like 'Building performance critical native applications' Hmm...
Upvotes: 16
Views: 2008
Reputation:
Just for fun, I tried building this in C# in Visual Studio 2010, and took a look at the JITed disassembly:
else
calc = calc + array1[i] - array2[k];
000000cf mov eax,dword ptr [ebp-10h]
000000d2 add eax,dword ptr [ebp-14h]
000000d5 sub eax,edx
000000d7 mov dword ptr [ebp-10h],eax
They made a number of improvements to the jitter in 4.0 of the CLR.
Upvotes: 4
Reputation: 116674
If your application's performance critical path consists entirely of unchecked array processing, I'd advise you not to rewrite it in C#.
But then, if your application already works fine in language X, I'd advise you not to rewrite it in language Y.
What do you want to achieve from the rewrite? At the very least, give serious consideration to a mixed language solution, using your already-debugged C code for the high performance sections and using C# to get a nice user interface or convenient integration with the latest rich .NET libraries.
A longer answer on a possibly related theme.
Upvotes: 1
Reputation: 1500495
As others have said, one of the aspects is bounds checking. There's also some redundancy in your code in terms of array access. I've managed to improve the performance somewhat by changing the inner block to:
int tmp1 = array1[i];
int tmp2 = array2[k];
if (tmp1 == tmp2)
{
calc = calc - array2[i] + array1[k];
}
else
{
calc = calc + tmp1 - tmp2;
}
That change knocked the total time down from ~8.8s to ~5s.
Upvotes: 6
Reputation: 564413
I believe your main issue in this code is going to be bounds checking on your arrays.
If you switch to using unsafe code in C#, and use pointer math, you should be able to achieve the same (or potentially faster) code.
This same issue was previously discussed in detail in this question.
Upvotes: 18
Reputation: 135011
C# is doing bounds checking
when running the calculation part in C# unsafe code does it perform as well as the native implementation?
Upvotes: 2
Reputation: 705
I am sure the optimization for C is different than C#. Also you have to expect at least a little bit of performance slow down. .NET adds another layer to the application with the framework.
The trade off is more rapid development, huge libraries and functions, for (what should be) a small amount of speed.
Upvotes: 0
Reputation: 55415
I believe you are seeing the results of bounds checks on the arrays. You can avoid the bounds checks by using unsafe code.
I believe the JITer can recognize patterns like for loops that go up to array.Length and avoid the bounds check, but it doesn't look like your code can utilizate that.
Upvotes: 13