user2448592
user2448592

Reputation: 101

Printing Unicode Characters in Python (with '\x00')

I am trying to print unicode characters in python.

What is happening:

$ python -c "print u'TEXT'" | xxd
0000000: 5445 5854 0a                             TEXT.

Expected:

$ python -c "print u'TEXT'" | xxd
0000000: 5400 4500 5800 5400 0a                   T.E.X.T..

What am I doing wrong? Please help!

Upvotes: 0

Views: 1932

Answers (1)

jfs
jfs

Reputation: 414585

Python converts Unicode strings into bytes before printing. What you see is the correct output e.g., b'T' == b'\x54':

$ python -c"print u'TEXT'.encode('ascii')" | xxd
0000000: 5445 5854 0a                             TEXT.

Don't confuse Unicode string and a bytestring encoded in UTF-16 character encoding:

$ python -c"print u'TEXT'.encode('utf-16le')" | xxd
0000000: 5400 4500 5800 5400 0a                   T.E.X.T..

You could use PYTHONIOENCODING environment variable to change the character encoding used to encode the output for the whole script:

$ PYTHONIOENCODING=utf-16le python -c"print u'TEXT'" | xxd
0000000: 5400 4500 5800 5400 0a                   T.E.X.T..

Upvotes: 1

Related Questions