bas
bas

Reputation: 14982

STM HAL with FreeRTOS, yes/no/maybe?

I am struggling for a while to get my program running stable. I am experiencing hard faults while my program is running. I am going in circles.

My project:

Some freertos configs

And using the defines to map free rtos interrupt handlers to CMSIS

#define vPortSVCHandler    SVC_Handler
#define xPortPendSVHandler PendSV_Handler
#define xPortSysTickHandler SysTick_Handler

My program does the following in sequence:

Then "StartLwIP" does:

The "Test" task does:

When I use vTaskDelay(100), the program can run without problems for hours (ran it over night, no issues).

When I use vTaskDelay(10), the program runs for a while (between 1 minute - 5 minutes). Then it will crash and hang up in hard fault handler.

When I remove the vTaskDelay (which would be the preferred solution), it will crash even faster. Again, it will vary, but somewhere within seconds to a minute.

I am 99% sure the problem is not heap / stack related. The high water marks and heap consumption look perfectly fine. Not even close to go outside heap / stack.

Memory management is LWIP is somewhat confusing to me, but since I am only constantly opening and closing connections I can't believe I am running out of PBUFs in LWIP. I extended the numbers anyway.

I am struggling for weeks, and eventually started to doubt the STM HAL. Then I stumbled upon the __HAL_LOCK in peripheral libraries (uart in my case). For example in HAL_UART_Transmit_IT

HAL_StatusTypeDef HAL_UART_Transmit_IT(UART_HandleTypeDef *huart, uint8_t *pData, uint16_t Size)
{
  /* Check that a Tx process is not already ongoing */
  if (huart->gState == HAL_UART_STATE_READY)
  {
    if ((pData == NULL) || (Size == 0U))
    {
      return HAL_ERROR;
    }

    /* Process Locked */
    __HAL_LOCK(huart);                         <<<<====== 

    huart->pTxBuffPtr = pData;
    huart->TxXferSize = Size;
    huart->TxXferCount = Size;

    huart->ErrorCode = HAL_UART_ERROR_NONE;
    huart->gState = HAL_UART_STATE_BUSY_TX;

    /* Process Unlocked */
    __HAL_UNLOCK(huart);                       <<<<====== 

    /* Enable the UART Transmit data register empty Interrupt */
    __HAL_UART_ENABLE_IT(huart, UART_IT_TXE);

    return HAL_OK;
  }
  else
  {
    return HAL_BUSY;
  }
}

When I go to the definition of the lock macro I got a bit worried:

#if (USE_RTOS == 1U)
  /* Reserved for future use */
  #error "USE_RTOS should be 0 in the current HAL release"
#else
  #define __HAL_LOCK(__HANDLE__)                                           \

I've read several threads on this. Here and here for example. I can also read many topics that the locking mechanism is poorly implemented and not thread safe at all. Interesting, since even without an RTOS, but using interrupts would then be a potential problem.

I downloaded STMCube latest version to check if this would be solved by now. But it's all still in the same state. STM HAL doesn't seem to do much with their USE_RTOS marco.

In my program, I am using different tasks that read and write over the same uart instance. The LWIP TCP thread will send data, while the LWIP RX thread will constantly read from uart. My uart receives data in interrupt mode (passing byte by byte to a ring buffer).

Finally my questions:

  1. Is it possible that this locking mechanism is the root cause for my hard faults? I was trying to find somebody who experiences the same problem but couldn't find "proof" in that sense that would confirm this. So maybe the "horrible locking mechanism" isn't the best implementation, but is not the root cause for my problem.

  2. Are there "steps" to take to get more details out of a hard fault? I would really like to find the offending line of code. I found this page that explains how to continue, but I don't know how to obtain the pc (I am using VScode, I can break in the while(1) loop, but then what...?).

It always crashes here:

enter image description here

HardFault_Handler
prvPortStartFirstTask
xPortStartScheduler

Sorry for the lengthy question, but I wanted to be thorough at least and hope that somebody can confirm some things, or maybe even help me in the right direction to get past this....

Many thanks in advance!

Upvotes: 4

Views: 3181

Answers (2)

0___________
0___________

Reputation: 68034

There are three common reasons for the HF when switching the task:

  1. Use of non ISR functions in ISR.
  2. Stack overflow (enable stack overflow checks in the freeRTOS config)
  3. Wrong interrupt priorities All three are very well explained in freeRTOS documentation.

I personally prefer to do the initializations in the tasks when the scheduler is started. It prevents many very difficult to track problems with code.

I use freeRTOS running with HAL, LWIP, different types of network connections without any problems.

Upvotes: 5

bas
bas

Reputation: 14982

As I could have guessed, the problem was in my own code. I was using std::vector and I think I suffered from heap fragmentation because of the mallocs done by vector.

This page helped me a lot troubleshooting hard faults!

https://interrupt.memfault.com/blog/cortex-m-fault-debug

Upvotes: 2

Related Questions