iMrFelix
iMrFelix

Reputation: 345

STM32 Interrupt driven UART receival fails after several flawless receives

Please note the clarification and update at the end of the post

TL;DR: An STM32 has 3 UART connections, 1 for debugging and 2 for actual communication which use the interrupt-driven HAL_UART_Receive_IT. Initially, interrupt driven UART receive works fine, though over time the receive callback for one of the UARTs fires less and less until eventually the STM32 doesn't receive any packets on that one UART at all (despite me being able to verify that they were sent). I suspect the issue to be timing related.

Situation: As part of my thesis, I developed a novel protocol which now has to be implemented and tested. It involves two classes of actors, a server and devices. A device consists of an STM32, ESP32 and a UART to Ethernet bridge. The STM32 is connected via UART to the bridge and via UART to the ESP32. The bridge connects the STM32 to the server by converting serial data sent by the STM32 to TCP packets which it forwards to the server (and vice versa). The ESP32 receives framed packets from the STM32, broadcasts them via BLE and forwards all received and well-formed BLE packets to the STM32. I.e. the ESP32 is just a BLE bridge. The server and ESP32 seem to be working flawlessly.

In a nutshell, the server tries to find out which devices D_j can hear BLE advertisements from device D_i. The server does that by periodically iterating over all devices D_1, ..., D_n and sends them nonces Y_1, ..., Y_n encrypted as X_1, ..., X_n. Upon D_i receiving X_i, it decrypts it to get Y_i, which it then forwards to the ESP32 to be broadcasted via BLE. Conversely, whenever the STM32 receives a packet from the ESP32 (i.e. a packet broadcasted via BLE), it extracts some data, encrypts it and forwards it to the server.

After the server has iterated over all devices, it looks at all the messages it received during that round. If it e.g. received a message with value Y_i sent by D_j, it can deduce that D_i's broadcast somehow arrived at D_j.

Problem: The way I have it set up right now, each STM32 seems to occasionally "miss" messages sent by the ESP32. The more such devices I have in my setup, the worse it gets! With just two devices, the protocol works 100% of the time. With three devices, it also seems to work fine. However, with four devices the STM32's UART receive callback for the ESP32 works fine initially, but after a couple of such rounds it doesn't trigger all the time until eventually it doesn't trigger at all.

Visualization: The below picture shows a sample topology of n devices. Not drawn here, but if e.g. D_1 was to receive Y_2, it would encrypt it to X_2' and send it across the bridge to the server. Visualization of the network and timeline

N.B.:

  1. Encryption and Decryption each take ca. 130ms
  2. Average one way delay for one ESP32 receiving packet, broadcasting it and another ESP32 receiving is ca. 15ms
  3. I am aware that UART is not a reliable protocol per se and that one should use framing in a real setting. Nevertheless, I was instructed to just assume that UART is perfect and doesn't drop anything.
  4. Due to the larger scope of the project, using an RTOS is not an option

Code:

#define LEN_SERVER_FRAMED_PACKET 35
#define LEN_BLE_PACKET           24

volatile bool_t new_server_msg;
volatile bool_t new_ble_msg;

byte_t s_rx_framed_buf[LEN_SERVER_FRAMED_PACKET];   // Receive buffer to be used in all subsequent Server send operations
ble_packet_t ble_rx_struct;        // A struct. The whole struct is then interpreted as uint8_t ptr. when being sent to the ESP32 over UART

Init:

< set up some stuff>
err = HAL_UART_Receive_IT(&SERVER_UART, s_rx_framed_buf, LEN_SERVER_FRAMED_PACKET);
if (!check_success_hal("Init, setting Server ISR", __LINE__, err)){
    print_string("Init after Signup: Was NOT able to set SERVER_UART ISR");
}else{
    print_string("Init after Signup: Was able to set SERVER_UART ISR");

}
err = HAL_UART_Receive_IT(&BLE_UART, &ble_rx_struct, LEN_BLE_PACKET);
if(!check_success_hal("Init, setting BLE ISR", __LINE__, err)){
    print_string("Init after Signup: Was NOT able to set BLE_UART ISR");
}else{
    print_string("Init after Signup: Was able to set BLE_UART ISR");

}

Main loop:

while (1)
{

    // (2) Go over all 3 cases: New local alert, new BLE message and new Server message and handle them accordingly

    // (2.1) Check whether a new local alert has come in
    if (<something irrelevant happens>)
    {
        <do something irrelevant>
    }

    // (2.2) Check for new ble packet. Technically it checks for packets from the UART to the ESP32.
    if (new_ble_msg)
    {
        new_ble_msg = FALSE;
        int ble_rx_type_code = ble_parse_packet(&ble_rx_nonce, &ble_rx_struct);
        HAL_UART_Receive_IT(&BLE_UART, &ble_rx_struct, LEN_BLE_PACKET);                           // Listen for new BLE messages.
        <compute some stuff, rather quick> server_tx_encrypted(<stuff computed>, &c_write, "BLE", __LINE__); // Encrypts <stuff computed> and sends it to the server using a BLOCKING HAL_UART_Transmit(...).
                                                                                                             // Encryption takes ca. 130ms.
    }

    // (2.3) Check for new server packet
    if (new_server_msg)
    {
        new_server_msg = FALSE;                                             // Set flag to false
        memcpy(s_wx_framed_buf, s_rx_framed_buf, LEN_SERVER_FRAMED_PACKET); // Copy from framed receive buffer to framed working buffer.
                                                                            // This is done such that we can process the current message while also being able to receive new messages

        HAL_UART_Receive_IT(&SERVER_UART, s_rx_framed_buf, LEN_SERVER_FRAMED_PACKET); // Listen for new server messages.

        <decrypt it, takes ca.130 - 150ms. results in buffer ble_tx_struct>

            err = HAL_UART_Transmit(&BLE_UART, ble_tx_struct,
                                    LEN_BLE_PACKET, UART_TX_TIMEOUT);
        check_success_hal(err); // If unsuccessful, print that to debug UART
    }

    /* USER CODE END WHILE */

    /* USER CODE BEGIN 3 */
}

UART receive callback function:

void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart)
{

    if (huart == &SERVER_UART)
    { // One should technically compate huart -> Instance, but that works aswell...
        new_server_msg = TRUE;
        print_string("UART Callback: Server ISR happened!\r\n"); // Blocking write to debug UART. I know that this is typically considered bad form,
                                                                 // but as the callback function is only called once per receive and because that's the only way of letting me know that the callback has occurred,
                                                                 // I chose to keep the print in.
    }
    else if (huart == &BLE_UART)
    {
        new_ble_msg = TRUE;
        print_string("UART Callback: BLE ISR happened!\r\n");
    }
    else
    {
        print_string("UART Callback: ISR triggered by unknown UART_HandleTypeDef!\r\n");
    }
}

What I have tried so far:

I wrote a client implementation in Go and ran it on my computer, where clients would just directly send UDP messages to each other instead of BLE. As that version functioned flawlessly even with many "devices", I am confident that the problem lies squarely at the STM32 and its STM32 <-> ESP32 UART connection.

To get it working with 3 devices, I simply removed most of the debugging statements of the STM32 and made the server wait 250ms between sending X_i to D_{i} and X_{i + 1} to D_{i + 1}. As this seems to have at least made the problem so infrequent that I haven't noticed it anymore, I reckon that the core issue is timing related.

Through drawing execution traces, I have already found an inherent weakness to my approach: if an STM32 calls HAL_UART_Receive_it(&BLE_UART, ble_rx_buf, LEN_BLE_PACKET) while the ESP32 is currently transmitting a packet to the STM and has already sent k bytes, the STM32 will only receive LEN_BLE_PACKET - k bytes. This causes the BLE_UART.RxXferCount to be wrong for when the next packet is sent by the ESP32.

On a more theoretical front, I first considered doing DMA instead of interrupt driven receive. I then refrained however, as in the STM32 DMA doesn't use descriptor rings like in more powerful systems but instead really just removes the overhead from having to receive LEN_BLE_PACKET (resp LEN_SERVER_FRAMED_PACKET) interrupts.

I have also already of course checked stackoverflow, several people seem to have experienced similar issues. E.g. UART receive interrupt stops triggering after several hours of successful receive, "Uart dma receive interrupt stops receiving data after several minutes" .

Questions:

  1. Given what I have described above, how is it possible for the STM32's callback of BLE_UART to simply stop triggering after some time without any apparent reason?

  2. Does it seem plausible that the issue I raised in the last paragraph of "What I have tried so far" is actually the cause of the problem?

  3. How can I fix this issue?


Clarification:

After the server sends a request to a device D_i, the server waits for 250ms before sending the next request to D_{i + 1}. Hence, the D_i has a 250ms transmission window in which no D_j can transmit anything. I.e. when it's D_i's turn to broadcast its nonce, the other devices have to simply receive one UART message.

As the receival from the server is typically rather fast, the decryption takes 130ms and the UART transmit with a baud of 115200 is also quick, this window should be long enough.

UPDATE:

After posting the question, I changed the ESP32 such that BLE packets are not immediately forwarded over UART to the STM32. Instead, they are enqueued and a dedicated task in the ESP32 dequeues them with a minimum 5ms delay between packets. Hence, the STM32 should now have a guaranteed 5ms between each BLE packet. This was done to reduce the burstiness (despite there not actually being any bursts due to what is mentioned in the clarification... I was just desperate). Nevertheless, this seems to have made the STM32 "survive" for longer before the UART receiver locking up.

Upvotes: 3

Views: 2233

Answers (1)

dev_nerd
dev_nerd

Reputation: 1

You need to be very careful especially when using STM32 HAL library for production, the libraries isn't reliable when receiving fast and continuous data from the server or anywhere else.

I will suggest a solution to this problem based on what I did when implementing for similar application. This works well for my Firmware-Over-The-Air(FOTA) project and helps to eliminate any possible UART failures when using STM32 HAL library.

Steps are listed below:

  • Ensure you reset the UART by calling MX_USARTx_UART_Init()
  • Reconfigure the callback either for HAL_UART_Receive_IT() or HAL_UART_Receive_DMA()

This two settings would eliminate any UART failure for receive interrupt using STM32 HAL.

Upvotes: -2

Related Questions