Reputation: 81

Is it faster to multiply low numbers in C/C++ (as opposed to high numbers)?

Example of question:

Is calculating 123 * 456 faster than calculating 123456 * 7890? Or is it the same speed?

I'm wondering about 32 bit unsigned integers, but I won't ignore answers about other types (64 bit, signed, float, etc.). If it is different, what is the difference due to? Whether or not the bits are 0/1?

Edit: If it makes a difference, I should clarify that I'm referring to any number (two random numbers lower than 100 vs two random numbers higher than 1000)

Upvotes: 2

Answers (3)

Mats Petersson

Reputation: 129314

This is highly dependendent on the processor architecture and model.

In the old days (ca 1980-1990), the number of ones in the two numbers would be a factor - the more ones, the longer it took to multiply [after sign adjustment, so multiplying by -1 wasn't slower than multiplying by 1, but multiplying by 32767 (15 ones) was notably slower than multiplying by 17 (2 ones)]. That's because a multiply is essentially:

unsigned int multiply(unsigned int a, unsigned int b)
{  
    res = 0;  
    for(number of bits)
    {
        if (b & 1)
        {
           res += a;
        }
        a <<= 1;
        b >>= 1;
    }
}

In modern processors, multiply is quite fast either way, but 64-bit multiply can be a clock cycle or two slower than a 32-bit value. Simply because modern processors can "afford" to put down the whole logic for doing this in a single cycle - both when it comes to speed of transistors themselves, and the area that those transistors take up.

Further, in the old days, there was often instructions to do 16 x 16 -> 32 bit results, but if you wanted 32 x 32 -> 32 (or 64), the compiler would have to call a library function [or inline such a function]. Today, I'm not aware of any modern high end processor [x86, ARM, PowerPC] that can't do at least 64 x 64 -> 64, some do 64 x 64 -> 128, all in a single instruction (not always a single cycle tho').

Note that I'm completely ignoring the fact that "if the data is in cache is an important factor". Yes, that is a factor - and it's a bit like ignoring wind resistance when traveling at 200 km/h - it's not at all something you ignore in the real world. However, it is quite unimportant for THIS discussion. Just like people making sports cars care about aerodynamics, to get complex [or simple] software to run fast involves a certain amount of caring about the cache-content.

Upvotes: 3

Tony Delroy

Reputation: 106068

For builtin types up to at least the architecture's word size (e.g. 64 bit on a modern PC, 32 or 16 bit on most low-cost general purpose CPUs from the last couple decades), for every compiler/implementation/version and CPU I've ever heard of, the CPU opcode for multiplication of a particular integral size takes a certain number of clock cycles irrespective of the quantities involved. Multiplications of data with different sizes, performs differently on some CPUs (e.g. AMD K7 has 3 cycles latency for 16 bit IMUL, vs 4 for 32 bit).

It is possible that on some architecture and compiler/flags combination, a type like long long int has more bits than the CPU opcodes can operate on in one instruction, so the compiler may emit code to do the multiplication in stages and that will be slower than multiplication of CPU-supported types. But again, a small value stored at run-time in a wider type is unlikely to be treated - or perform - any differently than a larger value.

All that said, if one or both values are compile-time constants, the compiler is able to avoid the CPU multiplication operator and optimise to addition or bit shifting operators for certain values (e.g. 1 is obviously a no-op, either side 0 ==> 0 result, * 4 can sometimes be implemented as << 2). There's nothing in particular stopping techniques like bit shifting being used for larger numbers, but a smaller percentage of such numbers can be optimised to the same degree (e.g. there're more powers of two - for which multiplication can be performed using bit shifting left - between 0 and 1000 than between 1000 and 2000).

Upvotes: 8

user1413793

Reputation: 9347

For all intents and purposes, the same speed (even if there were differences in computation speed, they would be immeasurable). Here is a reference benchmarking different CPU operations if you're curious: http://www.agner.org/optimize/instruction_tables.pdf.

Upvotes: -3

Is it faster to multiply low numbers in C/C++ (as opposed to high numbers)?

Answers (3)

Related Questions