Reputation: 31
I have a bug in an embedded application that I'm not sure the best way to solve. I have a local variable for which I use the address as a pointer for a dma transfer. I do this a bit in the program, but only recently did it cause a failure. I think this is because optimization is reclaiming the memory for other purposes after the address is used in the program but before the dma transfer occurs. Example:
struct {
volatile uint32_t address_ptr;
} DMA;
void spi_write(uint8_t *data) {
DMA.address_ptr = reinterpret_cast<volatile uint32_t>(data);
}
main() {
uint8_t data_out[2] = {0xAB, 0xCD};
spi_write(data_out);
}
The result on spi is that random numbers are sent out. I can fix it by making data_out into a global variable. So I assume that the C compiler is assuming that it can use the data_out
memory for other purposes after DMA.address_ptr
has been set. I can't think of a way to prevent this from happening. Is there a way to prevent this from getting optimized like this? Or is there some other mistake?
The processor is an STM32 cortex M4. In both the optimized and unoptimized cases the DMA address points to an area of memory this is accessible to it (0x20000000 block). The issue shows itself independent of memory location: godbolt link.
Adding some additional debugging based on the comments and tinkering with godbolt compiler explorer. I'm using arm gcc 32 bit, Though I did find that clang does not seem to optimize out the assignment. At -O0 there is a definite initialization of data_out with the value 52651, which is 0xCDAB.
movw r3, #52651
At any higher level of optimization that instruction is removed. But the program doesn't optimize to nothing. In fact it appears to just load DMA.address_ptr
with some uninitialized memory from the stack. This seems to also be what is happening in my larger program. If I make address_ptr
non-volatile, then it will optimize to nothing. So far I've found three solutions that will keep the assignment.
spi_write(volatile uint8_t *data)
asm("nop");
data_out
a static const global. It does get initialized.I don't understand if of these solutions is a good way of dealing with the problem.
Upvotes: 1
Views: 297
Reputation: 222923
(a) Nothing in your program informs the compiler that data_out
will be read. Its address is put into DMA.address_ptr
, and DMA_address_ptr
is volatile, so that tells the compiler something will read the address. But that is insufficient to tell the compiler that the bytes in data_out
are read by anything. Since the compiler has no reason to believe the bytes will be read, it has no reason to store anything in the bytes. So the compiler is allowed to optimize away writes to data_out
, including its initialization. To fix that, declare data_out
with volatile
.
(b) Nothing in your program waits for the DMA to complete. Even if data_out
were declared static
, the program could terminate before the DMA completes. Maybe the DMA is completing, on those occasions when it works, before the procedure of terminating the program is complete (main
returns, exit functions are performed, a call is made to the operating system to terminate the process, and the operating system reclaims memory). You should have something in your program that waits for the DMA to complete.
Once you have that something, you can release the memory of data_out
. It could be automatic in main
(or another routine) and you return from main
(or that other routine), it could be static
and your program returns from main
, it could be dynamically allocated and your program frees it. In any case, those are the two pieces you need:
data_out
with volatile
to inform the compiler the contents of data_out
are read by means otherwise unknown to the compiler.data_out
.Upvotes: 2
Reputation: 98425
Is there a way to prevent this from getting optimized like this?
Portably? No.
What you want is:
main
.Static variables are guaranteed to exist while the body of main
is executing. So, when main
is entered, the static variables were already initialized, and they will exist until main
is returned from.
So, the simple fix is:
main() {
static uint8_t data_out[2] = {0xAB, 0xCD};
spi_write(data_out);
while (true);
}
Note that static lifetime is needed. Just looping at the end of main
will not necessarily fix it:
main() {
uint8_t data_out[2] = {0xAB, 0xCD};
spi_write(data_out);
while (true);
}
That's because the C code does not access data_out
after the spi_write
function has returned. So the lifetime of data_out
, from the point of view of C code, is over anyway, since there's no code that could use data_out
.
The static
storage class, on the other hand, does exactly what you'd want for a DMA access: it ensures that the variable is live as long as the main thread of execution is within the main
body. So, static
in combination with an endless loop, is one way to do it portably.
The loop could also monitor the DMA status to detect end of transfer, and break when the DMA transfer is over. At that time the SPI peripheral may still be busy transmitting data, so to be completely safe from variations in the hardware abstraction code, you'd want to either delay (sleep) a fixed time after the DMA transfer is over, or, after the DMA transfer is done, monitor the SPI busy status, and only exit when SPI is not busy. Do not skip the DMA transfer status, since otherwise there can be a race condition when you happen to "catch" SPI being idle "between the bytes" while the DMA transfer is still ongoing and the next byte will be loaded into the SPI transmit register shortly. Only by first waiting for DMA transfer to end, and then waiting for the SPI peripheral to become idle you can be reasonably sure that the loop (and thus main
) can be exited.
There's nothing wrong with just sticking an endless loop at the end of main
, though - simple and effective.
The body of the wait loop can have an asm("halt");
inside of it, to conserve power.
There is no optimization. Your data_out
was an automatic variable that was in scope from its definition until the end of the block. Since you exited from main, data_out
ceased to exist as well. It's not the compiler doing it, it's your code telling the compiler to do it - via the semantics of the C language. In other words, the compiler doesn't "optimize" anything necessarily. It just does what the standard expects it to do, whereas your code assumed variable lifetime that was nowhere to be found in the C standard.
Upvotes: 0