Reputation: 14982

STM HAL with FreeRTOS, yes/no/maybe?

I am struggling for a while to get my program running stable. I am experiencing hard faults while my program is running. I am going in circles.

My project:

Nucleo F446ze (STM32F446ze)
An LTE modem connected to uart2
My PC connected to uart3 (for logging only).
FreeRTOS downloaded from git, using their STM port
LWIP 2.1.0 download from git

Some freertos configs

configASSERT enabled in free rtos
configCHECK_FOR_STACK_OVERFLOW set to 2
configUSE_MALLOC_FAILED_HOOK set to 1
configTOTAL_HEAP_SIZE set to 30k (I have 10k left when I query remaining heap size)
INCLUDE_uxTaskGetStackHighWaterMark set to 1 (all tasks are within stack limits)
SysTick is dedicated for FreeRTOS. I use TIM6 on 1khz to increase the HAL tick.
All NVIC interrupts are set to 5 and higher, and again, configASSERT is enabled so pretty sure "interrupt management" is covered.

And using the defines to map free rtos interrupt handlers to CMSIS

#define vPortSVCHandler    SVC_Handler
#define xPortPendSVHandler PendSV_Handler
#define xPortSysTickHandler SysTick_Handler

My program does the following in sequence:

setup clocks and peripherals
enable interrupts
create "StartLwIP" task
start FreeRTOS scheduler

Then "StartLwIP" does:

Send commands via uart2 to LTE modem to enable data mode
Initialize LwIP stack (negotiate ppp with peer)
Start a new "Test" task

The "Test" task does:

Open connection to a TCP server on the internet
Send a message
Close socket
vTaskDelay [100|10|-]
repeat

When I use vTaskDelay(100), the program can run without problems for hours (ran it over night, no issues).

When I use vTaskDelay(10), the program runs for a while (between 1 minute - 5 minutes). Then it will crash and hang up in hard fault handler.

When I remove the vTaskDelay (which would be the preferred solution), it will crash even faster. Again, it will vary, but somewhere within seconds to a minute.

I am 99% sure the problem is not heap / stack related. The high water marks and heap consumption look perfectly fine. Not even close to go outside heap / stack.

Memory management is LWIP is somewhat confusing to me, but since I am only constantly opening and closing connections I can't believe I am running out of PBUFs in LWIP. I extended the numbers anyway.

I am struggling for weeks, and eventually started to doubt the STM HAL. Then I stumbled upon the __HAL_LOCK in peripheral libraries (uart in my case). For example in HAL_UART_Transmit_IT

HAL_StatusTypeDef HAL_UART_Transmit_IT(UART_HandleTypeDef *huart, uint8_t *pData, uint16_t Size)
{
  /* Check that a Tx process is not already ongoing */
  if (huart->gState == HAL_UART_STATE_READY)
  {
    if ((pData == NULL) || (Size == 0U))
    {
      return HAL_ERROR;
    }

    /* Process Locked */
    __HAL_LOCK(huart);                         <<<<====== 

    huart->pTxBuffPtr = pData;
    huart->TxXferSize = Size;
    huart->TxXferCount = Size;

    huart->ErrorCode = HAL_UART_ERROR_NONE;
    huart->gState = HAL_UART_STATE_BUSY_TX;

    /* Process Unlocked */
    __HAL_UNLOCK(huart);                       <<<<====== 

    /* Enable the UART Transmit data register empty Interrupt */
    __HAL_UART_ENABLE_IT(huart, UART_IT_TXE);

    return HAL_OK;
  }
  else
  {
    return HAL_BUSY;
  }
}

When I go to the definition of the lock macro I got a bit worried:

#if (USE_RTOS == 1U)
  /* Reserved for future use */
  #error "USE_RTOS should be 0 in the current HAL release"
#else
  #define __HAL_LOCK(__HANDLE__)                                           \

I've read several threads on this. Here and here for example. I can also read many topics that the locking mechanism is poorly implemented and not thread safe at all. Interesting, since even without an RTOS, but using interrupts would then be a potential problem.

I downloaded STMCube latest version to check if this would be solved by now. But it's all still in the same state. STM HAL doesn't seem to do much with their USE_RTOS marco.

In my program, I am using different tasks that read and write over the same uart instance. The LWIP TCP thread will send data, while the LWIP RX thread will constantly read from uart. My uart receives data in interrupt mode (passing byte by byte to a ring buffer).

Finally my questions:

Is it possible that this locking mechanism is the root cause for my hard faults? I was trying to find somebody who experiences the same problem but couldn't find "proof" in that sense that would confirm this. So maybe the "horrible locking mechanism" isn't the best implementation, but is not the root cause for my problem.
Are there "steps" to take to get more details out of a hard fault? I would really like to find the offending line of code. I found this page that explains how to continue, but I don't know how to obtain the pc (I am using VScode, I can break in the while(1) loop, but then what...?).

It always crashes here:

HardFault_Handler
prvPortStartFirstTask
xPortStartScheduler

Sorry for the lengthy question, but I wanted to be thorough at least and hope that somebody can confirm some things, or maybe even help me in the right direction to get past this....

Many thanks in advance!

Upvotes: 4

Answers (2)

0___________

Reputation: 68034

There are three common reasons for the HF when switching the task:

Use of non ISR functions in ISR.
Stack overflow (enable stack overflow checks in the freeRTOS config)
Wrong interrupt priorities All three are very well explained in freeRTOS documentation.

I personally prefer to do the initializations in the tasks when the scheduler is started. It prevents many very difficult to track problems with code.

I use freeRTOS running with HAL, LWIP, different types of network connections without any problems.

Upvotes: 5

bas

Reputation: 14982

As I could have guessed, the problem was in my own code. I was using std::vector and I think I suffered from heap fragmentation because of the mallocs done by vector.

This page helped me a lot troubleshooting hard faults!

https://interrupt.memfault.com/blog/cortex-m-fault-debug

Upvotes: 2

STM HAL with FreeRTOS, yes/no/maybe?

Answers (2)

Related Questions