Lee
Lee

Reputation: 31

C++ optimization of local variable used for DMA

I have a bug in an embedded application that I'm not sure the best way to solve. I have a local variable for which I use the address as a pointer for a dma transfer. I do this a bit in the program, but only recently did it cause a failure. I think this is because optimization is reclaiming the memory for other purposes after the address is used in the program but before the dma transfer occurs. Example:

struct {
    volatile uint32_t address_ptr;
} DMA;

void spi_write(uint8_t *data) {
  DMA.address_ptr = reinterpret_cast<volatile uint32_t>(data);
}

main() {
  uint8_t data_out[2] = {0xAB, 0xCD};
  spi_write(data_out);
}

The result on spi is that random numbers are sent out. I can fix it by making data_out into a global variable. So I assume that the C compiler is assuming that it can use the data_out memory for other purposes after DMA.address_ptr has been set. I can't think of a way to prevent this from happening. Is there a way to prevent this from getting optimized like this? Or is there some other mistake?

The processor is an STM32 cortex M4. In both the optimized and unoptimized cases the DMA address points to an area of memory this is accessible to it (0x20000000 block). The issue shows itself independent of memory location: godbolt link.


Adding some additional debugging based on the comments and tinkering with godbolt compiler explorer. I'm using arm gcc 32 bit, Though I did find that clang does not seem to optimize out the assignment. At -O0 there is a definite initialization of data_out with the value 52651, which is 0xCDAB.

movw    r3, #52651

At any higher level of optimization that instruction is removed. But the program doesn't optimize to nothing. In fact it appears to just load DMA.address_ptr with some uninitialized memory from the stack. This seems to also be what is happening in my larger program. If I make address_ptr non-volatile, then it will optimize to nothing. So far I've found three solutions that will keep the assignment.

I don't understand if of these solutions is a good way of dealing with the problem.

Upvotes: 1

Views: 297

Answers (2)

Eric Postpischil
Eric Postpischil

Reputation: 222923

(a) Nothing in your program informs the compiler that data_out will be read. Its address is put into DMA.address_ptr, and DMA_address_ptr is volatile, so that tells the compiler something will read the address. But that is insufficient to tell the compiler that the bytes in data_out are read by anything. Since the compiler has no reason to believe the bytes will be read, it has no reason to store anything in the bytes. So the compiler is allowed to optimize away writes to data_out, including its initialization. To fix that, declare data_out with volatile.

(b) Nothing in your program waits for the DMA to complete. Even if data_out were declared static, the program could terminate before the DMA completes. Maybe the DMA is completing, on those occasions when it works, before the procedure of terminating the program is complete (main returns, exit functions are performed, a call is made to the operating system to terminate the process, and the operating system reclaims memory). You should have something in your program that waits for the DMA to complete.

Once you have that something, you can release the memory of data_out. It could be automatic in main (or another routine) and you return from main (or that other routine), it could be static and your program returns from main, it could be dynamically allocated and your program frees it. In any case, those are the two pieces you need:

  • You must declare data_out with volatile to inform the compiler the contents of data_out are read by means otherwise unknown to the compiler.
  • Your program must wait for the DMA to complete before it releases the memory of data_out.

Upvotes: 2

Is there a way to prevent this from getting optimized like this?

Portably? No.

What you want is:

  1. static variables, and
  2. don't return from main.

Static variables are guaranteed to exist while the body of main is executing. So, when main is entered, the static variables were already initialized, and they will exist until main is returned from.

So, the simple fix is:

main() {
  static uint8_t data_out[2] = {0xAB, 0xCD};
  spi_write(data_out);
  while (true);
}

Note that static lifetime is needed. Just looping at the end of main will not necessarily fix it:

main() {
  uint8_t data_out[2] = {0xAB, 0xCD};
  spi_write(data_out);
  while (true);
}

That's because the C code does not access data_out after the spi_write function has returned. So the lifetime of data_out, from the point of view of C code, is over anyway, since there's no code that could use data_out.

The static storage class, on the other hand, does exactly what you'd want for a DMA access: it ensures that the variable is live as long as the main thread of execution is within the main body. So, static in combination with an endless loop, is one way to do it portably.

The loop could also monitor the DMA status to detect end of transfer, and break when the DMA transfer is over. At that time the SPI peripheral may still be busy transmitting data, so to be completely safe from variations in the hardware abstraction code, you'd want to either delay (sleep) a fixed time after the DMA transfer is over, or, after the DMA transfer is done, monitor the SPI busy status, and only exit when SPI is not busy. Do not skip the DMA transfer status, since otherwise there can be a race condition when you happen to "catch" SPI being idle "between the bytes" while the DMA transfer is still ongoing and the next byte will be loaded into the SPI transmit register shortly. Only by first waiting for DMA transfer to end, and then waiting for the SPI peripheral to become idle you can be reasonably sure that the loop (and thus main) can be exited.

There's nothing wrong with just sticking an endless loop at the end of main, though - simple and effective.

The body of the wait loop can have an asm("halt"); inside of it, to conserve power.


There is no optimization. Your data_out was an automatic variable that was in scope from its definition until the end of the block. Since you exited from main, data_out ceased to exist as well. It's not the compiler doing it, it's your code telling the compiler to do it - via the semantics of the C language. In other words, the compiler doesn't "optimize" anything necessarily. It just does what the standard expects it to do, whereas your code assumed variable lifetime that was nowhere to be found in the C standard.

Upvotes: 0

Related Questions