Reputation: 6900

Performance vs Correctness/Preference?

typedef unsigned char uChar;
typedef signed char sChar;

typedef unsigned short uShort;
typedef signed short sShort;

typedef unsigned int uInt;
typedef signed int sInt;

typedef unsigned long uLong;
typedef signed long sLong;

I have a list of typedefs so when I define variables, I can be exact. For instance, if I only need the numbers 0-5 I'd use uChar. But I'm working in C++ and am making an engine. I was reading about booleans on .NET taking up X bytes and due to memory aligning it'd be quicker to use ints.

Is there a reason to use int rather than uChar due to memory aligning, performance or such?

Upvotes: 0

Answers (6)

old_timer

Reputation: 71586

Yes, using an int instead of a char will often result in a noticeable performance freebie. Thus the use of int by the C language to try to match the native size of the registers in the processor.

It is a good idea to use unsigned ints wherever possible, and only on the rare/specific occasion, use something other than an unsigned int. Avoid using something smaller than an int unless you have a really good reason. If you are trying to use smaller than an int items to get some freebie performance from a programming habit you need to change your habit the other way. Same goes for unsigned, use unsigned anything unless you have a really good reason to use signed.

Basically disassemble (or compile to asm), look at what your favorite and other compilers are generating, notice the unaligned addressing caused by chars, note the masking of the upper bits, the sign extension for signed chars, etc. These are sometimes free and sometimes not depending on where that byte is coming from and going to and the platform. Also try at a minimum x86 and arm, perhaps mips, gcc 3.x, 4.x and llvm. In particular notice how a single char mixed in with a list of ints in a line of declarations may cause the ints that follow to not be aligned, which is fine for x86 from an address standpoint but will cost in performance (on an x86 even with a cache). Put your aligned variables first then unaligned last. Other platforms that cannot or prefer not to do unaligned accesses will waste the extra bytes as padding, so you are not necessarily saving memory. The premature optimization is trying to tune to the variable length. Use simple habits like use unsigned ints for everything unless you have a specific reason, put your larger, aligned, variables and structures first in a list of declarations, and the unaligned stuff last (shorts then chars).

Multiplies (and divides) make this habit ugly, avoiding multiplies and divides in code is the best habit to have. If you have to use one be quite knowledgeable about its implementation. It is much better to multiply two chars instead of two ints for example (if the numbers support it), so if you do happen to know the ints are really 7 bit or 5 bit or whatever quantities, typecast them down for the multiply and allow a hardware multiply to happen instead of a soft multiply. (can be dormant bug if those variable sizes change!!). Even though many processors have a hardware multiply it is very rare that it can actually be used directly. Unless you help the compiler it has to make a library call to check for overflow among other things, and may end up doing a soft multiply as a result, very costly. Divides are bad because most processors do not include a divide. And if they do you may fall into the same trap. with multiply a N bit * N bit turns into 2*N bits of result which is where the multiply problem comes in. With divide the numbers stay the same or get smaller. In both cases the isas dont always provide enough bits to cover the overflow and a library call is required to work around the processors hardware limitations.

Floating point is a similar story, just be careful with floating point. Dont use it unless absolutely necessary. Most folks dont remember off hand that

float a;
float b;
...
b = a * 1.0;

C assumes double precision unless otherwise specified, so the above multiply requires a to be converted to double, then multiplied then the result converted back to single. Some fpus can do the precision conversion in the same instruction at the cost of clocks, some cannot. Precision conversion is where the majority of your floating point processor errors live (or did). So either use doubles for everything, or be careful with your coding to avoid these pitfalls:

float a;
float b;
...
b = a * 1.0F;

Also most isas do not have an FPU so avoid floating point math even more than you avoid fixed point multiplies and divides. Assume most fpus have bugs. It is difficult to write good floating point code (the programmer often throws away a fair amount of the precision by just not knowing how to use it and write code for it).

A few simple habits and your code runs noticeably faster and cleaner as a freebie. Also the compiler doesnt have to work as hard so you fall into fewer compiler bugs.

EDIT adding a floating point precision example:

float fun1 ( float a )
{
    return(a*7.1);
}

float fun2 ( float a )
{
    return(a*7.1F);
}


the first function contained:
    mulsd   .LC0(%rip), %xmm0
using a 64 bit floating point constant
.LC0
    .long   1717986918
    .long   1075603046

and the second function contains the desired single precision multiply
    mulss   .LC1(%rip), %xmm0
with a single precision constant
.LC1
    .long   1088631603

char fun1 ( char a )
{
    return(a+7);
}
int fun2 ( int a )
{
    return(a+7);
}

fun1:
    add r0, r0, #7
    and r0, r0, #255
    bx  lr
fun2:
    add r0, r0, #7
    bx  lr

Upvotes: 1

Merlyn Morgan-Graham

Reputation: 59151

Mock up a prototype and use a profiler on it before you even begin to think about micro-optimization perf. Remember: if it is a constant (or even a small change to a co-efficient), Big-O treats it the same.

In my experience, using unsigned types breaks a lot of common approaches to error checking, and makes it so that you run into integer storage wrapping errors (and bugs) almost immediately, yet at the same time makes it more difficult to reason about the solution.

Also, implicit casts make bugs much more likely when using unsigned types.

For example:

#include<iostream>

void SomeFunction(uint32_t value)
{
  if(value < 0)
  {
    // unreachable code.  What do we do instead?
    throw std::runtime_error("value must be non-negative");
  }
}

uint32_t SomeOtherFunction()
{
  return (uint32_t)2000000000 + (uint32_t)2000000000;
}

int main(int argc, char* argv[])
{
  int someValue = -1;
  SomeFunction(someValue);
  someValue = SomeOtherFunction();
  std::cout << someValue;
}

-294967296

Upvotes: 1

supercat

Reputation: 81307

Because the C standard explicitly defines the effects of exceeding the bounds of unsigned types, compilers may have to add extra code to make them behave as indicated. Consequently, it's possible for one data type to be faster for things in memory, and for another type to be faster for things kept in registers. For example, consider the code:

  uInt16 var1;
  Int32 var2;

  var1++;
  var2 = var1;

The ARM processor I use has only has 32-bit instructions for register operations, but which can do 8, 16, and 32-bit loads and stores. If var1 is in memory, it can be operated upon just as nicely as if it were a 32-bit integer, but if it's in a register the compiler will have to add an instruction to clear the upper word before copying it to var2. If var1 were a signed 16-bit integer, loading it from memory would be slower than if it were unsigned (because of the necessary sign extension), but if it were kept in a register the compiler would not be required to worry about the upper bits.

Upvotes: 1

Paul Rubel

Reputation: 27252

You really don't want to waste time on stuff that's not in the critical path. Once you have things working, then you should profile and see where the problems are. Then you can speed up the trouble spots.

A system that's fast but doesn't work is worthless. A slow system that works is useful to some people and will become useful to more as it gets faster.

Also remember that an unoptimized but appropriate algorithm will beat a super-optimized poor algorithm nearly every time.

Upvotes: 5

Potatoswatter

Reputation: 137930

Those typedefs aren't more exact than what they name. They're more terse, but nonstandard.
If you want to be more exact, use #include <stdint.h> to get int8_t, uint32_t, etc.
If you need to worry about memory alignment, you will find out through other means.
If you need to store a large number of Booleans, look into std::bitset and std::vector<bool>.
If you need to store one Boolean, use bool!

Upvotes: 10

duffymo

Reputation: 309008

This is the kind of premature optimization that rarely matters much. I'd choose a data structure and get on with it. Once you have a complete system that has a problem, profile it to find out where the issue is. Your chances of guessing and hitting the poor performance nail on the head are small indeed.

Upvotes: 13

Performance vs Correctness/Preference?

Answers (6)

Related Questions