Reputation: 61

optimizing a line of C code for 8 bit processor

I'm working on a 8bit processor and have written code in a C compiler, now more than 140 lines of code are taking just 1200 bytes and this single line is taking more than 200 bytes of ROM space. eeprom_read() is a function, there should be a problem with this 1000 and 100 and 10 multiplication.

romAddr = eeprom_read(146)*1000 + eeprom_read(147)*100 +
          eeprom_read(148)*10 + eeprom_read(149);

Processor is 8-bit and data type of romAddr is int. Is there any way to write this line in a more optimized way?

Upvotes: 3

Answers (4)

sh1

Reputation: 4751

It depends very, very much on the compiler, but I would suggest that you at least simplify the multiplication this way:

romAddr = ((eeprom_read(146)*10 + eeprom_read(147))*10 +
          eeprom_read(148))*10 + eeprom_read(149);

You could put this in a loop:

uint8_t i = 146;
romAddr = eeprom_read(i);
for (i = 147; i < 150; i++)
    romAddr = romAddr * 10 + eeprom_read(i);

Hopefully the compiler should recognise how much simpler it is to multiply a 16-bit value by ten, compared with separately implementing multiplications by 1000 and 100.

I'm not completely comfortable relying on the compiler to deal with the loop effectively, though.

Maybe:

uint8_t hi, lo;
hi = (uint8_t)eeprom_read(146) * (uint8_t)10 + (uint8_t)eeprom_read(147);
lo = (uint8_t)eeprom_read(148) * (uint8_t)10 + (uint8_t)eeprom_read(149);
romAddr = hi * (uint8_t)100 + lo;

All of these are untested.

Upvotes: 1

Mike Dunlavey

Reputation: 40679

You're concerned about space, not time, right? You've got four function calls, with an integer argument being passed to each one, followed by a multiplication by a constant, followed by adding. Just as a first guess, that could be

load integer constant into register (6 bytes)
push register (2 bytes,
call eeprom_read (6 bytes)
adjust stack (4 bytes)
load integer multiplier into register (6 bytes)
push both registers (4 bytes),
call multiplication routine (6 bytes)
adjust stack (4 bytes)
load temporary sum into a register (6 bytes)
add to that register the result of the multiplication (2 bytes)
store back in the temporary sum (6 bytes).

Let's see, 6+2+6+4+6+4+6+4+6+2+6= about 52 bytes per call to eeprom_read. The last call would be shorter because it doesn't do the multiply.

I would try calling eeprom_read not with arguments like 146 but with (unsigned char)146, and multiplying not by 1000 but by (unsigned short)1000. That way, you might be able to tease the compiler into using shorter instructions, and possibly using a multiply instruction rather than a multiply function call. Also, the call to eeprom_read might be macro'ed into a direct memory fetch, saving the pushing of the argument, the calling of the function, and the stack adjustment.

Another trick could be to store each one of the four products in a local variable, and add them all together at the end. That could generate less code. All these possibilities would also make it faster, as well as smaller, though you probably don't need to care about that.

Another possibility for saving space could be to use a loop, like this:

static unsigned short powerOf10[] = {1000, 100, 10, 1};
unsigned short i;
romAddr = 0;
for (i = 146; i < 150; i++){
  romAddr += powerOf10[i-146] * eeprom_read(i);
}

which should save space by having the call and the multiply only once, plus the looping instructions, rather than four copies.

In any case, get handy with the assembler language that the compiler generates.

Upvotes: 1

akalenuk

Reputation: 3845

Sometimes the multiplication can be compiled into a sequence of additions, yes. You can optimize it say by using left shift operator.

A*1000 = A*512 + A*256 + A*128 + A*64 + A*32 + A*8

Or the same thing:

A<<9 + A<<8 + A<<7 + A<<6 + A<<5 + A<<3

This still is way longer then a single "multiply" instruction, but your processor apparently doesn't have it anyway, so this might be the next best thing.

Upvotes: 1

unwind

Reputation: 399919

It's possible that the thing that uses the most space is the use of multiplication. If your processor lacks an instruction to do multiplication, the compiler is forced to use software to do it step by step, which can require quite a bit of code.

It's hard to say, since you don't specify anything about your target processor (or which compiler you're using).

One way might be to somehow try to reduce inlining, so the code to multiply by 10 (which is used in all four terms) can be re-used.

To know if this is the case at all, the machine code must be inspected. By the way, the use of decimal constants for an address calculation is really odd.

Upvotes: 1

optimizing a line of C code for 8 bit processor

Answers (4)

Related Questions