UTF-8 problems in writing a UART-Console on a microcontroller

Question

I am currently writing a uart-console on an ATMega1284p. It supposed to echo the characters back, so that the computer-side-console actually sees what is being typed and that is it for now.

Here is the problem: With ASCII it works perfectly fine, but if I am sending anything beyond ASCII e.g. a '§' my minicom shows "�§" '�' being the invalid or the '§' in case everything works fine. But getting the combination of both throws me off and I currently have no idea where the problem is!

Here is part of my code:

    char c;
    while(m_uart->recv(c) > 0) {
        m_lineBuff[m_lineIndex++] = c;
        if(c == '
') {
            c = '
';
            m_lineBuff[m_lineIndex++] = c;
            m_sendCount = 2;
        } else {
            m_sendCount = 1;
        }
        this->send();
        if(c == '
') {
            m_lineBuff[m_lineIndex++] = '\0';
            // invoke some callbacks that handle the line at some point
            m_lineIndex = 0;
        }
    }

m_lineBuff is a self written (and tested) vector of chars. m_uart is a self written (and also tested) UART driver for the micro-internal hardware uart. this->send sends m_sendCount bytes using m_uart.

What I tried so far: I verified that the baud rates of minicom and my micro match (115200). I verified that the frequency is within the 2% range (micro is running at 20MHz). Both minicom and the micro are setup for 8n1. I verified that minicom works by hooking it up to a little-board I had lying around. On that board any utf-8 digit works just fine.

Does anyone see my mistake or does anyone have a clue at what I haven't considered?

I'll be happy to supply up to all of my code if you guys are interested in it.

EDIT/Elaboration:

Observation 1 (prior to starting this project)

The PC side program (minicom) can send and recieve characters to resp. from the microcontroller. It does not show the sent characters though.

Conclusion 1 (prior to starting this project)

The microcontroller side needs to send the characters back to the PC, so that you have the behaviour of a console. Thus I immediately send back any character I get.

Observation 2 (after implementing it)

When I press '§' (or any other character consisting of more than 1 byte) (using minicom) I see "�§".

Conclusion 2 (after implementing it)

Something I can't explain with my knowledge is going on. Maybe a small delay between the two bytes making up the character lead to minicom printing a '�' first because the first byte on it's own is indeed an invalid character, and when the second character comes in minicom realizes that it's acutally '§' but minicom doesn't remove/overwrite the '�'. If that is the problem, then how do I solve it? Does my microcontroller need to react faster/with less delay in between characters?

EDIT2:

I replaced the '?' with the actual character '�' using the power of copy and paste.

More tests I did

I tried the character '😹' and as I expexted (it backs my conclusion 2) and I got "��😹". '😹' by the way is a 4 byte character. Set the baud rate of micro and minicom to 9600: exact same behaviour. I managed to set minicom into hex mode: it sends regularly but outputs hex... When I send '😹' I get "f0 9f 98 b9" which (at least according to this site) is correct... Is that backing my conclusion 2? And more importantly: how do I get rid of that behaviour. It works with my little linux board instead of my micro.

UTF-8 problems in writing a UART-Console on a microcontroller

Answers (1)

Related Questions