Amit Singh Tomar
Amit Singh Tomar

Reputation: 8610

How do I optimize this Piece of code in C

Yesterday,in an Interview I have been asked to test the 5th bit in a number(to test whether its on and off)and below is how I done it.

int number = 16;
int mask   = 1<<5;

if ((number & mask) == 0)
    printf("Bit is off");
else
    printf("its on");

but he then asked me to optimize this code and do it without using this particular mask.

So my questions what else mask I could have used in this code?

Upvotes: 6

Views: 1121

Answers (11)

sravanreddy001
sravanreddy001

Reputation: 257

In one of the interviews, i gave the following answer, and he was satisfied, but a little change to the question was 'check if nth bit is set.

int N = 16;
printf ("%d\n", (N >> (n-1)) % 2); 

So, when making the answer generic, Not exactly sure which one (of below) is faster for this example.

1<<(n-1) & N (or)
N>>(n-1) % 2 (or)
N>>(n-1) & 1

Upvotes: 0

gbulmer
gbulmer

Reputation: 4290

Maybe the interviewer wanted to see how you reacted to a simple challenge. Or simply wanted to know if you really understood C, and would stand your ground? Maybe the interviewer wanted to know if you knew non-zero is true, and hence test your depth of understanding of C? Or maybe whether you could do binary to hex in your head?

IMHO interviews are about a lot more than code. They are expensive to do. I always tried to get a clear impression of the person, something hard to do by written communication, or even on the phone. After all, some of those folks are going to work with the recruit!

The most compact, and possibly the quickest is probably:

int number = 16;  // is this really what they gave?

printf((number & 0x20)?"its on":"Bit is off"); // did they mean 5th or bit 5?

Edit:

I've coded up the original approach, and my alternative, and compiled it for ARM Coretx-M3/4 (this is the processor I am writing for at the moment). It was compiled with -O3. Then I have disassembled each compiled file (using objdump) to get the assembler. I did it this way because the output of gcc -S was huge; that includes a lot of extra information for the assembler and linker, which made it harder to find the code.

I replaced printf with a dummy_printf to avoid #including stdio.h which added more noise. The dummy_printf isn't exactly the same as printf, it just takes one parameter, but it keeps the output short :-)

The source (all 7 files appended to make it easier to read) are at: http://pastebin.com/PTeApu9n

The resulting concatenated output of objdump (for each .o) is at: http://pastebin.com/kHAmakE3

As you can see, the original is compiled to:

void original_bit5(int number) {
    int mask = 1<<5;

    if ((number & mask) == 0)
   0:   f010 0f20   tst.w   r0, #32
   4:   d005        beq.n   1a <original_bit5+0x1a>
        dummy_printf("Bit is off");
    else
        dummy_printf("its on"); 
   6:   f240 0000   movw    r0, #0
   a:   f2c0 0000   movt    r0, #0
   e:   f7ff bffe   b.w 0 <dummy_printf>

void original_bit5(int number) {
    int mask = 1<<5;

    if ((number & mask) == 0)
        dummy_printf("Bit is off");
  12:   f240 0000   movw    r0, #0
  16:   f2c0 0000   movt    r0, #0
  1a:   f7ff bffe   b.w 0 <dummy_printf>
  1e:   bf00        nop

I think the call to dummy_printf is using tail-call chaining, i.e. dummy_printf will not return to this function. Very efficient!

There is no function entry code because the first four function parameters are passed in registers r0-r3.

You can't see the addresses of the two strings being loaded in r0. That is because this hasn't been linked.

You can see that:

int mask = 1<<5;    
if ((number & mask) == 0)

is compiled to:

   0:   f010 0f20   tst.w   r0, #32
   4:   d005        beq.n   1a <original_bit5+0x1a>

So 1<<5 and (... == 0), are compiler to a more direct and efficient sequence of instructions. There is a branch to the appropriate call of dummy_printf.

My code compiles to:

void my_bit5(int number) {
    dummy_printf((number & 0x20)?"its on":"Bit is off");    
   0:   f240 0200   movw    r2, #0
   4:   f240 0300   movw    r3, #0
   8:   f010 0f20   tst.w   r0, #32
   c:   f2c0 0200   movt    r2, #0
  10:   f2c0 0300   movt    r3, #0
  14:   bf14        ite ne
  16:   4610        movne   r0, r2
  18:   4618        moveq   r0, r3
  1a:   f7ff bffe   b.w 0 <dummy_printf>
  1e:   bf00        nop

This also seems to get tail-call optimised, i.e. there is no return from this function because there is no need of one, the return by dummy_printf will return directly to main()

What you can't see is the two registers, r2 and r2 will contain the addresses of the two strings. That is because this hasn't been linked.

As you can see there is a conditional execution instruction 'ite' which loads the parameter register r0 with either register r2 or register r3. So there is no branch in this code.

For a simple processor with pipelining, this can be quite efficient. On a simple pipelined processor, a branch can cause a 'pipeline 'stall' while parts of the pipeline are cleared out. This varies from processor to processor. So I assume gcc has got it right, and generated a better code sequence than executing a branch. I haven't checked.

Spurred on by Lundin, I have come up with this:

void union_bit5(int number) {
    union { int n; struct { unsigned :5; unsigned bit :1; }; } tester;
    tester.n = number;

    if (tester.bit)
        dummy_printf("Bit is on");
    else
        dummy_printf("its off");    
}

It does not explicitly include a mask, or bit shifting. It is almost certainly compiler dependent, you'd have to test it to ensure it works (glerk !-(

gcc for ARM generates the same code (bne vs beq, but that can be adjusted) as the OP's solution, so no optimisation, but it removes the mask:

void union_bit5(int number) {
    union { int n; struct { unsigned :5; unsigned bit :1; }; } tester;
    tester.n = number;

    if (tester.bit)
   0:   f010 0f20   tst.w   r0, #32
   4:   d105        bne.n   1a <union_bit5+0x1a>
        dummy_printf("Bit is on");
    else
        dummy_printf("its off");    
   6:   f240 0000   movw    r0, #0
   a:   f2c0 0000   movt    r0, #0
   e:   f7ff bffe   b.w 0 <dummy_printf>
void union_bit5(int number) {
    union { int n; struct { unsigned :5; unsigned bit :1; }; } tester;
    tester.n = number;

    if (tester.bit)
        dummy_printf("Bit is on");
  12:   f240 0000   movw    r0, #0
  16:   f2c0 0000   movt    r0, #0
  1a:   f7ff bffe   b.w 0 <dummy_printf>
  1e:   bf00        nop

For what it's worth:

(number & 0x20)? dummy_printf("its on") : dummy_printf("Bit is off");

gcc for ARM generates exactly the same code as the OP's. It generates branch, and not conditional instructions.

Summary:

  1. The original code is compiled to a very efficient sequence of instructions
  2. The ternary ...?...:... operator can compile to code which does not involve branches on the ARM Cortex-M3/4, but may also generate normal branch instructions.
  3. It is difficult to write more efficient code than the original in this case :-)

I'll add, IMHO the cost of doing a printf is so enormous compared to a bit test, that worrying about optimising a bit test is too small an issue; it fails Amdahl's Law. An appropriate tactic for the bit test is to ensure -O3 or -Os is used.


If you wanted to to do something somewhat perverse (especially for such a trivial problem), but different, which might make the interviewer think, create a lookup table for every byte value. (I don't expect it to be faster ...)

#define BIT5(val) (((val)&0x20)?1:0)
const unsigned char bit5[256] = {
BIT5(0x00),BIT5(0x01), BIT5(0x02),BIT5(0x03), 
BIT5(0x04),BIT5(0x05), BIT5(0x06),BIT5(0x07),
// ... you get the idea ...
BIT5(0xF8),BIT5(0xF9), BIT5(0xFA),BIT5(0xFB), 
BIT5(0xFC),BIT5(0xFD), BIT5(0xFE),BIT5(0xFF)
};

//...
if (bit5[(unsigned char)number]) {
    printf("its on");
} else {
    printf("Bit is off");
}

This is a handy technique if there are some complex bit patterns in, for example, a peripheral register, which need converting to a decision, or switch. It is O(1) too

You could combine the two!-)

Upvotes: 6

Latyas
Latyas

Reputation: 35

A c learner newbie's try

int number = 16;
if(16 == number&(0x10))
    puts("true");
else
    puts("false");

Upvotes: 0

kasavbere
kasavbere

Reputation: 6003

Forgive the following answer:

I used to work at a start-up, when the company decided not to pursue a candidate, they come up with a bogus reason to end the interview. Perhaps this was the poster's experience.

asking for kth bit may mean the least significant bit is the zeroth bit so that (number & 1 << 5) would not do. But that was not the issue. He asked for optimization. Sometime the reason you fail an interview has nothing to do with you. In that case it's their loss; there will always be another interview opportunity.

Upvotes: 0

Lundin
Lundin

Reputation: 213306

Any attempt to optimize that code falls under the category "premature optimization". If you understand how the compiler translates C to machine code, you would not attempt to optimize that code. I am guessing the interviewer lacked such knowledge.

If we dissect that code, this is what we get:

1<<5 is translated to the literal 32 at compile time. There is absolutely no difference in performance between writing int mask = 1<<5; and int mask = 32;, but the latter is far harder to understand.

Further,

  • if ((number & mask) == 0) is completely equivalent to
  • if ((number & 32) == 0) is completely equivalent to
  • if ((number & (1<<5)) == 0)

There exists two cases:

  • Either the compiler needs to find a memory location to store the mask.
    • If the user declared a variable mask, the value will be stored there.
    • If the user didn't declare a variable, the value will be stored in an invisible temporary variable.
    • RAM consumption in the two above cases is completely equivalent.
  • Or the compiler does not need to store the mask anywhere. It will optimize away the whole mask variable or numeric literal and bake them in with the rest of the program instruction.

Which of these two that will be picked depends on whether int number = 16; is modified or not from the point of declaration to the if statement where the masking takes place.

And that's it. Any attempt to write the code differently than in your original example is premature optimization and obfuscation and will not result in any performance difference.

Upvotes: 0

SquareRootOfTwentyThree
SquareRootOfTwentyThree

Reputation: 7766

Everybody is shifting to the right. I want to be original and shift to the left:

#define INDEX 5

int number = 16;

if (number<<(sizeof(number)*8-INDEX-1)<0)

  printf("Bit #%d is set in %d.\n", INDEX, number);
else    
  printf("Bit #%d is NOT set in %d.\n", INDEX, number);

This code is ugly and absolutely implementation dependent (the C standard says that the result is undefined). On x86 it works and it is somewhat more efficient because the MSB is always copied in the bit #7 ("sign") of the flags register, which can be tested with a single jns instruction.

In other words, for INDEX 5, you have:

[...]
shl $0x1F, %eax
test %eax, %eax
jns 8053635
[...]

The original solution of the OP is cleaner, and that's how production code should be like.

Upvotes: 0

madper
madper

Reputation: 796

Are you sure that you should move it 5 bits? How about that:

int n = 16;
printf ("%d\n", (n >> 4) % 2); 

Upvotes: 1

user1232138
user1232138

Reputation: 5541

(number & 16)?printf("yes"):printf("no");

Upvotes: 0

anurag-jain
anurag-jain

Reputation: 1410

One more way is

1: Right shift the number 5 times so that 5th bit becomes 0th from the right(i.e LSB).
2: Now the logic is the numbers with LSB as 1 are odd, and the ones with 0 are even, Check that using %2

If you think the mod operations are much costlier than bit operation, I think it all depends on the compiler. Have a look at this thread

AND faster than integer modulo operation?.

I'm not sure why the interviewer would have asked you to optimize, may be he was expecting the modulus method as answer.

Upvotes: 1

k06a
k06a

Reputation: 18745

There are two ways to check bit:

if (number & (1 << bit)) { ... }
if ((number >> bit) & 1) { ... }

I think it will be interesting for you: http://graphics.stanford.edu/~seander/bithacks.html

Upvotes: 2

James
James

Reputation: 25513

You could use the bit test assember instruction, but it's not impossible that the compiler will pick up on what you're doing and do that anyway.

Apart from that there isn't really anything to optimise, and certainly the only way to see if any of the minor variations on your method that are possible are faster would be to profile.

Here's the code that gcc 4.2.1 -O3 produces for if((number >> 5) & 1)):

0000000100000ee0    pushq   %rbp
0000000100000ee1    movq    %rsp,%rbp
0000000100000ee4    shrl    $0x05,%edi
0000000100000ee7    notl    %edi
0000000100000ee9    andl    $0x01,%edi
0000000100000eec    movl    %edi,%eax
0000000100000eee    leave
0000000100000eef    ret

and for if(number & (1 << 5)):

0000000100000ee0    pushq   %rbp
0000000100000ee1    movq    %rsp,%rbp
0000000100000ee4    shrl    $0x05,%edi
0000000100000ee7    notl    %edi
0000000100000ee9    andl    $0x01,%edi
0000000100000eec    movl    %edi,%eax
0000000100000eee    leave
0000000100000eef    ret

So we see that at least gcc 4.2.1 produces identical code in these cases, but it doesn't use the bit test instruction.

Upvotes: 0

Related Questions