Quentis
Quentis

Reputation: 23

Embedded C avoid optimization when copy global variable to stack variable in critical section

I am working on the RTOS support in our SW components.

There are some scenarios when it makes sense to create a copy of global variables into local variables in a critical section (e.g. protected by mutex) and then use the local copy further in time consuming operations outside of critical section.

I am afraid that C compiler might optimize the local variable assignments and use the global variable directly which would undermine the efforts of eliminating the race conditions from the code.

I wrote an oversimplified LCD display example code to illustrate the problem.

I have the following questions:

uint8_t page_sel = 0;
char lcd_text[PAGE_CNT][ROW_CNT][COLUMN_CNT];

void lcd_print_text(uint8_t page, uint8_t row, const char* text)
{
  lock();
  // Store text in global variable which represents
  // the text on the display
  copy_text_to_lcd_text(page, row, text);
  unlock();

  // Display update request to run the lcd background task
  refresh_semaphore_set();
}

void lcd_select_page(uint8_t page)
{
  lock();
  // Store the selected page
  page_sel = page;
  unlock();

  // Display update request to run the lcd background task
  // If the selected page changes then lcd shall be updated
  refresh_semaphore_set();
}

void lcd_task(void)
{
  while(1) {
    // Update the display only if there are modifications
    refresh_semaphore_get();
    refresh();
  }
}

void refresh(void)
{ 
  char page_lcd_text[ROW_CNT][COLUMN_CNT]
  uint8_t page;

  lock();
  // Page number and text shall be consistent
  // so critical section is necessary
  page = page_sel;
  // Copy is created to avoid partial overwrites during
  // display update
  copy_page(page_lcd_text, lcd_text, page);
  // It is essential to have a local copy before
  // the critical section is left
  unlock();
  
  // Create pixel data in frame buffer from text (slow)
  render_text(page_lcd_text);

  // Create pixel data in frame buffer to display (slow)
  // selected page number on display
  render_page_num(page);

  // Transfer pixel data to LCD driver chip (even slower)
  lcd_spi_transfer();
}

Upvotes: 1

Views: 132

Answers (1)

Lundin
Lundin

Reputation: 214770

How could it be guaranteed that the local variables won't be optimized?

The only way to ensure that is to declare them as volatile. Which likely also means they'll get stack allocated rather than register allocated.

Related to that, you will most likely not want to allocate char page_lcd_text[ROW_CNT][COLUMN_CNT] on the stack, since it will lead to a dangerous stack peak usage whenever the function is called. I recommend that it should be declared static so it gets allocated in .data, or if that's not possible because you fear race conditions, then allocate it on the caller side and pass along a pointer to the buffer from main() to the LCD driver during initialization.

In embedded programming, a rule of thumb of good design is actually to never copy buffers at any point, since it is slow. Doing so inside locks is particularly bad design, since you stall the whole system. The normal way to design embedded systems is instead to pre-allocate n number of buffers statically (depending on use and if you need double/triple buffering etc) and then only swap pointers pointing at the buffers, rather than doing a hardcopy. For example so that the background program works with one buffer while some SPI/DMA driver works with another buffer. And when they are done you just swap the pointers.

How could it be guaranteed that the order of lock-unlock and copy happens as intended?

That's up to the compiler and in particular the RTOS lib implementation. In general, compilers should not instruction re-order code which boils down to inline assembler. So in case your locks are some manner of function-like macros boiling down to inline assembler, then that will prevent re-ordering. If that's not how the macros are implemented, well then you'll have to ask who designed the RTOS how it is supposed to work.

On the C level then (arguably) volatile accesses are not allowed to be re-ordered in respect to anything else in the code. However, some compilers are more liberal and just ensure that volatile accesses are not re-ordered in relation to each other, which is arguably non-conforming to the C standard. Either way, there is nothing in C itself except volatile which might act as a "memory barrier". Atomic access for example won't do a difference when it comes to re-ordering.

Would volatile type qualifier help in this case (local variables)?

Yes. And not just only for the local variables but also the file scope ones. Otherwise the optimizer might make strange assumptions regarding if a file scope variable has been updated since the last time or not. This is a bigger issue than potential re-ordering.

Upvotes: 3

Related Questions