Reputation: 563

Printing all unicode characters in Python

I've written some code to create all 4-digit combinations of the hexidecimal system, and now I'm trying to use that to print out all the unicode characters that are associated with those values. Here's the code I'm using to do this:

char_list =["0","1","2","3","4","5","6","7","8","9","A","B","C","D","E","F"]
pairs = []
all_chars = []

# Construct pairs list
for char1 in char_list:
    for char2 in char_list:
        pairs.append(char1 + char2)

# Create every combination of unicode characters ever
    for pair1 in pairs:
        for pair2 in pairs:
            all_chars.append(pair1 + pair2)

# Print all characters
for code in all_chars:
    expression = "u'\u" + code + "'"
    print "{}: {}".format(code,eval(expression))

And here is the error message I'm getting:

Traceback (most recent call last): File "C:\Users\andr7495\Desktop\unifun.py", 
line 18, in <module> print "{}: {}".format(code,eval(expression))
UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in position 0: 
ordinal not in range(128)

The exception is thrown when the code tries to print u"\u0080", however, I can do this in the interactive interpreter without a problem.

I've tried casting the results to unicode and specifying to ignore errors, but it's not helping. I feel like I'm missing a basic understanding about how unicode works, but is there anything I can do to get my code to print out all valid unicode expressions?

Upvotes: 4

Answers (5)

wucaibuyi

Reputation: 1

import sys
import unicodedata
import time

for i in range(0x1000,0xFFFF+1):
    try:
        print(f"U+{i:06X}\t{chr(i)}\t{unicodedata.name(chr(i)):<}")
    except ValueError:
        print(f"u+{i:06X} not in unicode characters table")
        # time.sleep(0.3)
        continue

Upvotes: 0

Bimo

Reputation: 6665

Here's a rewrite of examples in this article that saves the list to a file.

Python 3.x:

import sys 
txtfile = "unicode_table.txt"
print("creating file: " + txtfile) 
F = open(txtfile, "w", encoding="utf-16", errors='ignore')
for uc in range(sys.maxunicode):
    line = "%s %s" % (hex(uc), chr(uc))
    print(line, file=F)
F.close()

Upvotes: 0

Mark Ransom

Reputation: 308548

You're trying to format a Unicode character into a byte string. You can remove the error by using a Unicode string instead:

print u"{}: {}".format(code,eval(expression))
      ^

The other answers are better at simplifying the original problem however, you're definitely doing things the hard way.

Upvotes: 0

Michał Šrajer

Reputation: 31192

import sys
for i in xrange(sys.maxunicode): 
  print unichr(i);

Upvotes: 15

Joran Beasley

Reputation: 114108

it is likely a problem with your terminal (cmd.exe is notoriously bad at this) as most of the time when you "print" you are printing to a terminal and that ends up trying to do encodings ... if you run your code in idle or some other space that can render unicode you should see the characters. also you should not use eval try this

for uni_code in range(...):
    print hex(uni_code),unichr(uni_code)

Upvotes: 0

Printing all unicode characters in Python

Answers (5)

Related Questions