Venemo
Venemo

Reputation: 19097

Why does iostream take so much flash space on an MCU?

I use GCC 5.2.0 to compile code for an EFM32 MCU (based on a Cortex-M core). I notice an awful increase in code size when I want to #include <iostream>.

For example, let's compile the following code for an EFM32WG "Wonder Gecko" chip:

#include "em_device.h"
#include "em_chip.h"
#include <iostream>

int main(void)
{
  CHIP_Init();

  while (1) {
  }
}

This code will result in 172048 bytes of code, whereas without #include <iostream> it is only 1440 bytes.

I usually just use cout for debug output (by implementing the _write function for newlib and routing the output to the SWO pin), but it looks like this approach is very wasteful, considering the MCU only has 256k of flash, and just including this header will make the code use up most of it.

So, my question is: why is including the iostream header make the compiled code take such an insane amount of flash space? And also, is there a way to fix it?

EDIT:

Both the compiler and linker is arm-none-eabi-g++ (version 5.2.0), the C library is the nano C library (I think).

Here are my C++ compiler flags (excluding the include paths):

-g -gdwarf-2 -mcpu=cortex-m4 -mthumb '-DEFM32WG940F256=1' -O0 -Wall -c -fmessage-length=0 -mno-sched-prolog -fno-builtin -ffunction-sections -fdata-sections -mfpu=fpv4-sp-d16 -mfloat-abi=softfp

Here are my linker flags:

-g -gdwarf-2 -mcpu=cortex-m4 -mthumb -T "${BuildArtifactFileBaseName}.ld" --specs=nosys.specs -Xlinker --gc-sections -Xlinker -Map="${BuildArtifactFileBaseName}.map" -mfpu=fpv4-sp-d16 -mfloat-abi=softfp --specs=nano.specs

I tried both with and without optimalizations, but the resulting code size remains about the same (the optimized size is maybe 1k smaller).

EDIT 2

-fno-rtti and -fno-exceptions do not help with the code size either.

Upvotes: 5

Views: 1613

Answers (2)

user76329
user76329

Reputation: 13

To learn what is taking up the space, I recommend Bloaty: https://github.com/google/bloaty/tree/main

It is, indeed, the initializations required for cin, cout, cerr, clog on the ARMv7-M toolchain. Interestingly, on Linux with -O0 it is 464 bytes more to include . The difference appears to be dynamic vs static linkage. When statically linked, really is that big with all its dependencies.

For a C++ solution, consider this reference implementation in uart_logger's uprint(), originally done for AVR: https://github.com/hostilefork/uart-logger

It was written pedantically for C++ just for this purpose of only including what we need and not paying for all of iostreams. Read uart_logger.h for the commentary and hacking to suit your purpose.

uprint("Value:", Uprint::hex(10));            // => `Value: D\n`
uprint("Value:", Uprint::binary(4));          // => `Value: 100\n`
uprint("Current:", Uprint::units(5, "mA"));   // => `Current: 5mA\n`

Cheers, Joe

Edit: updated findings on shared vs static linkage.

Upvotes: 1

Nikolai
Nikolai

Reputation: 489

While the compiler does try to eliminate complete includes or parts of them that are not used this sometimes fails. Some headers just by being included cause code to be run - meaning that even if you do not refer to anything included from the header the compiler is not free to remove the code from it.

<iostream> is such an example as it declares some global objects whose constructors are run before main is called. Its inclusion will roughly increase the binary size for an STM32 by 140kB.

You can check this behaviour and reasoning of the gcc developers on github.

The solution is to avoid on microcontrollers and use what C offers for printing such as printf().

Upvotes: 4

Related Questions