Reputation: 4558
I am enumerating characters of a large character set like this (take GB2312 as an example, but much large in practice):
def get_gb2312_characters():
higher_range = range(0xb0, 0xf7 + 1)
lower_range = range(0xa1, 0xfe + 1)
# see http://en.wikipedia.org/wiki/GB_2312#Encodings_of_GB2312
for higher in higher_range:
for lower in lower_range:
encoding = (higher << 8) | lower
yield encoding.to_bytes(2, byteorder='big').decode(encoding='gb2312')
for c in get_gb2312_characters():
print(c)
This won't work because there are some "gaps" (or "garbage" byte combinations) in the code page. When the program tries to get a character from the generator in the last for
line, it will raise an UnicodeDecodeError
. The problem is that I cannot use try...except
to encompass the for
loop like
try:
for c in gb2312:
print(c)
except UnicodeDecodeError:
pass
since the loop will terminated immediately if there are exceptions, neither use the pair within the for
loop like
for c in gb2312:
try:
print(c)
except UnicodeDecodeError:
pass
because the exception is not raised inside. So any way to get around this? Thank you.
Upvotes: 2
Views: 190
Reputation: 236004
Try this using this for
loop inside your function:
for higher in higher_range:
for lower in lower_range:
encoding = (higher << 8) | lower
try:
yield encoding.to_bytes(2, byteorder='big').decode(encoding='gb2312')
except UnicodeDecodeError:
pass
The values that fail will be silently ignored, and the generator will return only those that are valid.
Upvotes: 5
Reputation: 85442
Put the try except
around the yield
:
try:
yield encoding.to_bytes(2, byteorder='big').decode(encoding='gb2312')
except UnicodeDecodeError:
# handle exception here
pass
Upvotes: 4