Problems with unicode in Python

Question

I was having some trouble with using unicode in python, so I wrote this program, and I am confused by the results. Whenever I run it, different characters give me error #2, which means that utf32, utf16 and utf8 all gave errors when I tried to write a unicode character to my test file. Never the same ones. Is it a problem with my program, or am I doing somthing python is not designed to handle?

for a in range(65535):
    try:
        open('test_text.txt','w').write(unichr(a).encode("utf32"))
        if len(open('test_text.txt','r').read()) == 0:
            print  unichr(a) + ' Error #1 #' + str(a)
    except IOError:
        try:
            open('test_text.txt','w').write(unichr(a).encode("utf16"))
        except IOError:
            try:
                open('test_text.txt','w').write(unichr(a).encode("utf8"))
            except IOError:
                print unichr(a) + ' Error #2 #' + str(a)
    except UnicodeEncodeError:
        print unichr(a) + ' Error #3 #' + str(a)
raw_input('

Enter char to end:')

Matthew Wesly · Accepted Answer

Your code did not throw any errors when I tried it. Also, you're overriding the file every time through the loop. You could try changing the mode to 'a' instead of 'w' to append to the file. Or you could simply do the following:

f = open('test_text.txt','wb')
for a in range(65535):
    f.write(unichr(a).encode("utf32"))
f.close()

There is more information about reading/writing to files in python here: http://docs.python.org/2/tutorial/inputoutput.html

Problems with unicode in Python

Answers (1)

Related Questions