Can Ibanoglu
Can Ibanoglu

Reputation: 604

Python curses prints two characters when adding a utf-8 encoded string

I have come across a very weird problem when trying to print UTF-8 encoded strings to a curses window. Here's the code, I'll talk about the exact problem and the things I have tried below.

# coding=UTF-8
import curses
import locale
import time
locale.setlocale(locale.LC_ALL, '')
code = locale.getpreferredencoding()



class AddCharCommand(object):
    def __init__(self, window, line_start, y, x, character):
        """
        Command class for adding the specified character, to the specified
        window, at the specified coordinates.
        """
        self.window = window
        self.line_start = line_start
        self.x = x
        self.y = y
        self.character = character


    def write(self):
        if self.character > 127:
            # curses somehow returns a keycode that is 64 lower than what it
            # should be, this takes care of the problem.
            self.character += 64
            self.string = unichr(self.character).encode(code)
            self.window.addstr(self.y, self.x, self.string)
        else:
             self.window.addch(self.y, self.x, self.character)


    def delete(self):
        """
        Erase characters usually print two characters to the curses window.
        As such both the character at these coordinates and the one next to it
        (that is the one self.x + 1) must be replaced with the a blank space.
        Move to cursor the original coordinates when done.
        """
        for i in xrange(2):
            self.window.addch(self.y, self.x + i, ord(' '))
        self.window.move(self.y, self.x)

def main(screen):
    maxy, maxx = screen.getmaxyx()
    q = 0
    commands = list()
    x = 0
    erase = ord(curses.erasechar())
    while q != 27:
        q = screen.getch()
        if q == erase:
            command = commands.pop(-1).delete()
            x -= 1
            continue
        command = AddCharCommand(screen, 0, maxy/2, x, q)
        commands.append(command)
        command.write()
        x += 1

curses.wrapper(main)

Here's a Gist link to it.

The problem is that when I press the è key (which has the ASCII code 232), it doesn't print just that character. Instead, the string ăè is printed to the given coordinates. I have tried using self.window.addstr(self.x, self.y, self.string[1]) but that just resulted in gibberish being printed.

I then fired up a Python prompt to see the return value of unichr(232).encode('utf-8') and it is indeed a string of length 2.

The very unexpected behaviour is that, if I put in screen.addstr(4, 4, unichr(232).encode(code)) in main it will correctly display the è character, and only that character. This is also the case if I make the write method of the AddCharCommand class to print the è character no matter what.

The problem, of course, is not limited to è only, it is pretty much the case with all the extended-ASCII characters.

I know that extended-ASCII with curses is a bit flaky but I just can't understand this behaviour at all. It doesn't make any sense (to me) that the code works as expected if I hardcode the ASCII code, but it adds another character if I don't.

I have looked around and read quite a lot of stuff on curses, but I haven't been able to find a solution to this. I would greatly appreciate any help on this matter, it is driving me mad.

Maybe a bit less important, but I would love it if someone could explain to me why screen.getch() returns the incorrect ASCII code for characters above 127 and why the difference between the real ASCII code and the one returned by curses is 64.

Thank you very much in advance.

Upvotes: 0

Views: 2957

Answers (1)

FloriOn
FloriOn

Reputation: 287

It's working fine for me:

c=screen.get_wch()
screen.addch(c)

Upvotes: 2

Related Questions