Reputation: 4558

Ignore exceptions in a `for` statement

I am enumerating characters of a large character set like this (take GB2312 as an example, but much large in practice):

def get_gb2312_characters():
    higher_range = range(0xb0, 0xf7 + 1)
    lower_range = range(0xa1, 0xfe + 1)
    # see http://en.wikipedia.org/wiki/GB_2312#Encodings_of_GB2312

    for higher in higher_range:
        for lower in lower_range:
            encoding = (higher << 8) | lower
            yield encoding.to_bytes(2, byteorder='big').decode(encoding='gb2312')

for c in get_gb2312_characters():
    print(c)

This won't work because there are some "gaps" (or "garbage" byte combinations) in the code page. When the program tries to get a character from the generator in the last for line, it will raise an UnicodeDecodeError. The problem is that I cannot use try...except to encompass the for loop like

try:
    for c in gb2312:
        print(c)
except UnicodeDecodeError:
        pass

since the loop will terminated immediately if there are exceptions, neither use the pair within the for loop like

for c in gb2312:
    try:
        print(c)
    except UnicodeDecodeError:
        pass

because the exception is not raised inside. So any way to get around this? Thank you.

Upvotes: 2

Answers (2)

Óscar López

Reputation: 236004

Try this using this for loop inside your function:

for higher in higher_range:
    for lower in lower_range:
        encoding = (higher << 8) | lower
        try:
            yield encoding.to_bytes(2, byteorder='big').decode(encoding='gb2312')
        except UnicodeDecodeError:
            pass

The values that fail will be silently ignored, and the generator will return only those that are valid.

Upvotes: 5

Mike Müller

Reputation: 85442

Put the try except around the yield:

try:
    yield encoding.to_bytes(2, byteorder='big').decode(encoding='gb2312')
except UnicodeDecodeError:
    # handle exception here
    pass

Upvotes: 4

Ignore exceptions in a `for` statement

Answers (2)

Related Questions