webwurm
webwurm

Reputation: 89

Encoding/Decoding of German Umlaute in Python3

I have a problem I'm working on for a few hours and I can't get it fixed. I'm sure it is just a small thing, but somehow I don't know what I am doing wrong.

My aim is to get data via json from the public transport company and show the next departure-times of metro/tram on a display. Basically everything works, but as soon as json returns an umlaut (like "ü") I get an error message. The interesting thing is: The sharp s (ß) works!

Here is the exact error message (it should be "Hütteldorf"):

UnicodeEncodeError('ascii', u'H\xfctteldorf', 1, 2, 'ordinal not in range(128)')

The part of the code:

...
    apiurl = 'https://www.wienerlinien.at/ogd_realtime/monitor?rbl={rbl}&sender={apikey}'

...

        for rbl in rbls:
            r = requests.get(url, timeout=10)

            ##r.encoding = 'utf-8';
            ##print(r.json())
            ##print(r.encoding)
            ##r.encoding = 'latin1'

            if requests.codes.ok:
                try:
                    for monitor in r.json()['data']['monitors']:
                        rbl.station = monitor['locationStop']['properties']['title'].encode('utf-8')
                        for line in monitor['lines']:

                            #Decoding-Problem is here - ß works, ü doesn't
                            #UnicodeEncodeError('ascii', u'H\xfctteldorf', 1, 2, 'ordinal not in range(128)')
                            rbl.name = str(line['name'])
                            rbl.direction = str(line['towards'])

                            rbl.trafficjam = line['trafficjam'] #Boolean
...

I personally think I tried everything I found that is possible in Python3...encode, decode, ... Every time either the sharp s or the umlaut ü is failing.

Can someone give me a hint in the right direction? Thank you very much!

[Edit:] Here is the full source-code, which has a workaround (ü=ue):

#!/usr/bin/python
# -*- coding: utf-8 -*-

import sys, getopt, time
import requests
import smbus

# Define some device parameters
I2C_ADDR  = 0x27 # I2C device address, if any error, change this address to 0x3f
LCD_WIDTH = 20   # Maximum characters per line

# Define some device constants
LCD_CHR = 1 # Mode - Sending data
LCD_CMD = 0 # Mode - Sending command

LCD_LINE_1 = 0x80 # LCD RAM address for the 1st line
LCD_LINE_2 = 0xC0 # LCD RAM address for the 2nd line
LCD_LINE_3 = 0x94 # LCD RAM address for the 3rd line
LCD_LINE_4 = 0xD4 # LCD RAM address for the 4th line

LCD_BACKLIGHT  = 0x08  # On
#LCD_BACKLIGHT = 0x00  # Off

ENABLE = 0b00000100 # Enable bit

# Timing constants
E_PULSE = 0.0005
E_DELAY = 0.0005

#Open I2C interface
bus = smbus.SMBus(1) # Rev 2 Pi uses 1

class RBL:
    id = 0
    line = ''
    station = ''
    direction = ''
    time = -1

def replaceUmlaut(s):
    s = s.replace("Ä", "Ae") # A umlaut
    s = s.replace("Ö", "Oe") # O umlaut
    s = s.replace("Ü", "Ue") # U umlaut
    s = s.replace("ä", "ae") # a umlaut
    s = s.replace("ö", "oe") # o umlaut
    s = s.replace("ü", "ue") # u umlaut
    return s

def lcd_init():
  # Initialise display
  lcd_byte(0x33,LCD_CMD) # 110011 Initialise
  lcd_byte(0x32,LCD_CMD) # 110010 Initialise
  lcd_byte(0x06,LCD_CMD) # 000110 Cursor move direction
  lcd_byte(0x0C,LCD_CMD) # 001100 Display On,Cursor Off, Blink Off 
  lcd_byte(0x28,LCD_CMD) # 101000 Data length, number of lines, font size
  lcd_byte(0x01,LCD_CMD) # 000001 Clear display
  time.sleep(E_DELAY)

def lcd_byte(bits, mode):
  # Send byte to data pins
  # bits = the data
  # mode = 1 for data
  #        0 for command

  bits_high = mode | (bits & 0xF0) | LCD_BACKLIGHT
  bits_low = mode | ((bits<<4) & 0xF0) | LCD_BACKLIGHT

  # High bits
  bus.write_byte(I2C_ADDR, bits_high)
  lcd_toggle_enable(bits_high)

  # Low bits
  bus.write_byte(I2C_ADDR, bits_low)
  lcd_toggle_enable(bits_low)

def lcd_toggle_enable(bits):
  # Toggle enable
  time.sleep(E_DELAY)
  bus.write_byte(I2C_ADDR, (bits | ENABLE))
  time.sleep(E_PULSE)
  bus.write_byte(I2C_ADDR,(bits & ~ENABLE))
  time.sleep(E_DELAY)

def lcd_string(message,line):
  # Send string to display

  message = message.ljust(LCD_WIDTH," ")

  lcd_byte(line, LCD_CMD)

  for i in range(LCD_WIDTH):
    lcd_byte(ord(message[i]),LCD_CHR)


def main(argv):

    apikey = False
    apiurl = 'https://www.wienerlinien.at/ogd_realtime/monitor?rbl={rbl}&sender={apikey}'

    #Time between updates
    st = 10

    # Initialise display
    lcd_init()
    lcd_string("Willkommen!",LCD_LINE_2)

    try:
        opts, args = getopt.getopt(argv, "hk:t:", ["help", "key=", "time="])
    except getopt.GetoptError:
        usage()
        sys.exit(2)
    for opt, arg in opts:
        if opt in ("-h", "--help"):
            usage()
            sys.exit()
        elif opt in ("-k", "--key"):
            apikey = arg
        elif opt in ("-t", "--time"):
            try:
                tmpst = int(arg)
                if tmpst > 0:
                    st = tmpst
            except ValueError:
                usage()
                sys.exit(2)


    if apikey == False or len(args) < 1:
        usage()
        sys.exit()

    rbls = []
    for rbl in args:
        tmprbl = RBL()
        tmprbl.id = rbl
        rbls.append(tmprbl)

    x = 1
    while True:
        for rbl in rbls:
            url = apiurl.replace('{apikey}', apikey).replace('{rbl}', rbl.id)
            r = requests.get(url, timeout=10)
            r.encoding = 'utf-8'

            if requests.codes.ok:
                try:
                    for monitor in r.json()['data']['monitors']:
                        rbl.station = monitor['locationStop']['properties']['title']
                        for line in monitor['lines']:

                            rbl.name = replaceUmlaut(str(line['name'].encode('ascii','xmlcharrefreplace').decode('ascii')))
                rbl.direction = replaceUmlaut(str(line['towards'].encode('ascii','xmlcharrefreplace').decode('ascii')))

                            rbl.trafficjam = line['trafficjam']
                            rbl.type = line['type']
                            rbl.time1 = line['departures']['departure'][0]['departureTime']['countdown']
                            rbl.time2 = line['departures']['departure'][1]['departureTime']['countdown']
                            rbl.time3 = line['departures']['departure'][2]['departureTime']['countdown']

                            lcdShow(rbl)
                            time.sleep(st)

                except Exception as e:
                    print("Fehler (Exc): " + repr(e))
                    print(r)
                    lcd_string("Fehler (Exc):",LCD_LINE_1)
                    lcd_string(repr(e),LCD_LINE_2)
                    lcd_string("",LCD_LINE_3)
                    lcd_string("",LCD_LINE_4)
            else:
                print('Fehler bei Kommunikation mit Server')
                lcd_string("Fehler:",LCD_LINE_1)
                lcd_string("Serverkomm.",LCD_LINE_2)
                lcd_string("",LCD_LINE_3)
                lcd_string("",LCD_LINE_4)

def lcdShow(rbl):
    lcdLine1 = rbl.name + ' ' + rbl.station
    lcdLine2 = rbl.direction

    lcdLine3 = "".ljust(LCD_WIDTH-9) + ' ' + '{:0>2d}'.format(rbl.time1) + ' ' + '{:0>2d}'.format(rbl.time2) + ' ' + '{:0>2d}'.format(rbl.time3)

    if not rbl.type == "ptMetro":
        if rbl.trafficjam:
            lcdLine4 = "Stau in Zufahrt"
        else:
            lcdLine4 = "kein Stau"
    else:
        lcdLine4 = ""

    lcd_string(lcdLine1,LCD_LINE_1)
    lcd_string(lcdLine2,LCD_LINE_2)
    lcd_string(lcdLine3,LCD_LINE_3)
    lcd_string(lcdLine4,LCD_LINE_4)

    #print(lcdLine1 + '\n' + lcdLine2+ '\n' + lcdLine3+ '\n' + lcdLine4)

def usage():
    print('usage: ' + __file__ + ' [-h] [-t time] -k apikey rbl [rbl ...]\n')
    print('arguments:')
    print('  -k, --key=\tAPI key')
    print('  rbl\t\tRBL number\n')
    print('optional arguments:')
    print('  -h, --help\tshow this help')
    print('  -t, --time=\ttime between station updates in seconds, default 10')

if __name__ == "__main__":
    main(sys.argv[1:])

Upvotes: 1

Views: 5185

Answers (1)

Weeble
Weeble

Reputation: 17930

I personally think I tried everything I found that is possible in Python3...encode, decode, ... Every time either the sharp s or the umlaut ü is failing.

As noted in the comments, you appear to be running Python 2 based on the error messages you're seeing.

Python 2 has two 'string' types, str which contains raw bytes and unicode which contains unicode characters. When you call .json() you get back a data structure containing unicode strings. So line['name'] is one such unicode string.

When you call str(line['name']) you are implicitly asking to encode the unicode string into a sequence of ASCII bytes. This fails as ASCII cannot represent these characters. Unfortunately I don't know why you're trying to do this here. Does rbl.name need to be a str? Where is it used? What encoding is it expected to be in by other code using it?

In the comments, Jorropo suggests writing line['name'].decode("utf-8") which you indicate also doesn't work. This is because it doesn't really make sense to de-code a unicode string, but Python 2 will try anyway by first en-coding it in ASCII (which fails) before attempting to decode in UTF-8 as you requested.

Your fix is going to depend on what you're doing with rbl.name. You might:

  1. Just use the unicode string directly. rbl.name = line['name'] This requires that subsequent code expects a unicode string.
  2. Encode it into UTF-8 bytes. rbl.name = line['name'].encode('utf-8') This requires that subsequent code expects a sequence of UTF-8 bytes.

Either way, it's possible (or even probable) that something else will subsequently break when you try either of these, depending entirely on what assumptions the rest of the code makes about what rbl.name is supposed to be and how it's encoded.

As for why it works with u'Westbahnstraße' I couldn't say for sure. Can you provide a complete example including input data that demonstrates one working and the other not working?

Upvotes: 1

Related Questions