Prongs
Prongs

Reputation: 39

Python: Unexpected result when byte 0xe0 is read

import msvcrt
while True:
    try:
        a=msvcrt.getch()
        a=a.decode('utf-8')
        print(a)
    except:
        print(a)

The above piece of code yields unexpected results when I input arrow keys or page up/page down/delete etc.

The output is as follows:
[I/P=a]
a #expected result
[I/P=UP ARROW]
b'\xe0'
H  #unexpected result

I can understand b'\xe0' being printed, but why is H being printed along with it? H is not printed when I do this:

import msvcrt
a=msvcrt.getch()
print(a)#b'\xe0'
a=a.decode('utf-8')
print(a)
When I input UP ARROW here, it raises a UNICODEDECODERROR.

I've looked at the other question that explains how msvcrt.getch() works, but that still fails to explain why I get two characters in the first piece of code and only one character in the second piece of code. Instead of waiting for the next character to be entered, why does a assume the value b'H'?

Upvotes: 1

Views: 489

Answers (1)

RJHunter
RJHunter

Reputation: 2867

Arrow keys (and function keys and others) need two separate calls to msvcrt.getch. When you press , the first one returns b'\xe0' and the second one returns b\x48. Neither of these are UTF-8 or even ASCII. The first one is not a valid UTF-8 sequence, which is why your decode('utf-8') call throws an exception. The second one is a byte value representing keycode 72, which coincidentally happens to be the same byte value that represent the letter 'H' in UTF-8 or ASCII.

From the msvcrt documentation (emphasis mine):

msvcrt.getch()

Read a keypress and return the resulting character as a byte string. Nothing is echoed to the console. This call will block if a keypress is not already available, but will not wait for Enter to be pressed. If the pressed key was a special function key, this will return '\000' or '\xe0'; the next call will return the keycode. The Control-C keypress cannot be read with this function.

You can see the bytes coming through using a program like this one:

import msvcrt

NEXT_CHARACTER_IS_KEYCODE = [b'0xe0', b'0x00']

while True:
  ch1 = msvcrt.getch()
  print("Main getch(): {}".format(ch1))
  if ch1 in NEXT_CHARACTER_IS_KEYCODE:
      ch2 = msvcrt.getch()
      print("  keycode getch(): {}".format(ch2))

Note that there's no .decode('utf-8') in there since getch doesn't return UTF-8 bytes anyway.

(note: make sure you really want to use msvcrt.getch, since that's a pretty unusual choice, especially in 2019.)

Related:

Upvotes: 2

Related Questions