Reputation: 14982
I am struggling for a while to get my program running stable. I am experiencing hard faults while my program is running. I am going in circles.
My project:
Some freertos configs
And using the defines to map free rtos interrupt handlers to CMSIS
#define vPortSVCHandler SVC_Handler
#define xPortPendSVHandler PendSV_Handler
#define xPortSysTickHandler SysTick_Handler
My program does the following in sequence:
Then "StartLwIP" does:
The "Test" task does:
When I use vTaskDelay(100), the program can run without problems for hours (ran it over night, no issues).
When I use vTaskDelay(10), the program runs for a while (between 1 minute - 5 minutes). Then it will crash and hang up in hard fault handler.
When I remove the vTaskDelay (which would be the preferred solution), it will crash even faster. Again, it will vary, but somewhere within seconds to a minute.
I am 99% sure the problem is not heap / stack related. The high water marks and heap consumption look perfectly fine. Not even close to go outside heap / stack.
Memory management is LWIP is somewhat confusing to me, but since I am only constantly opening and closing connections I can't believe I am running out of PBUFs in LWIP. I extended the numbers anyway.
I am struggling for weeks, and eventually started to doubt the STM HAL. Then I stumbled upon the __HAL_LOCK
in peripheral libraries (uart in my case). For example in HAL_UART_Transmit_IT
HAL_StatusTypeDef HAL_UART_Transmit_IT(UART_HandleTypeDef *huart, uint8_t *pData, uint16_t Size)
{
/* Check that a Tx process is not already ongoing */
if (huart->gState == HAL_UART_STATE_READY)
{
if ((pData == NULL) || (Size == 0U))
{
return HAL_ERROR;
}
/* Process Locked */
__HAL_LOCK(huart); <<<<======
huart->pTxBuffPtr = pData;
huart->TxXferSize = Size;
huart->TxXferCount = Size;
huart->ErrorCode = HAL_UART_ERROR_NONE;
huart->gState = HAL_UART_STATE_BUSY_TX;
/* Process Unlocked */
__HAL_UNLOCK(huart); <<<<======
/* Enable the UART Transmit data register empty Interrupt */
__HAL_UART_ENABLE_IT(huart, UART_IT_TXE);
return HAL_OK;
}
else
{
return HAL_BUSY;
}
}
When I go to the definition of the lock macro I got a bit worried:
#if (USE_RTOS == 1U)
/* Reserved for future use */
#error "USE_RTOS should be 0 in the current HAL release"
#else
#define __HAL_LOCK(__HANDLE__) \
I've read several threads on this. Here and here for example. I can also read many topics that the locking mechanism is poorly implemented and not thread safe at all. Interesting, since even without an RTOS, but using interrupts would then be a potential problem.
I downloaded STMCube latest version to check if this would be solved by now. But it's all still in the same state. STM HAL doesn't seem to do much with their USE_RTOS marco.
In my program, I am using different tasks that read and write over the same uart instance. The LWIP TCP thread will send data, while the LWIP RX thread will constantly read from uart. My uart receives data in interrupt mode (passing byte by byte to a ring buffer).
Finally my questions:
Is it possible that this locking mechanism is the root cause for my hard faults? I was trying to find somebody who experiences the same problem but couldn't find "proof" in that sense that would confirm this. So maybe the "horrible locking mechanism" isn't the best implementation, but is not the root cause for my problem.
Are there "steps" to take to get more details out of a hard fault? I would really like to find the offending line of code. I found this page that explains how to continue, but I don't know how to obtain the pc (I am using VScode, I can break in the while(1) loop, but then what...?).
It always crashes here:
HardFault_Handler
prvPortStartFirstTask
xPortStartScheduler
Sorry for the lengthy question, but I wanted to be thorough at least and hope that somebody can confirm some things, or maybe even help me in the right direction to get past this....
Many thanks in advance!
Upvotes: 4
Views: 3181
Reputation: 68034
There are three common reasons for the HF when switching the task:
I personally prefer to do the initializations in the tasks when the scheduler is started. It prevents many very difficult to track problems with code.
I use freeRTOS running with HAL, LWIP, different types of network connections without any problems.
Upvotes: 5
Reputation: 14982
As I could have guessed, the problem was in my own code. I was using std::vector and I think I suffered from heap fragmentation because of the mallocs done by vector.
This page helped me a lot troubleshooting hard faults!
https://interrupt.memfault.com/blog/cortex-m-fault-debug
Upvotes: 2