Reputation: 101
I am trying to print unicode characters in python.
What is happening:
$ python -c "print u'TEXT'" | xxd
0000000: 5445 5854 0a TEXT.
Expected:
$ python -c "print u'TEXT'" | xxd
0000000: 5400 4500 5800 5400 0a T.E.X.T..
What am I doing wrong? Please help!
Upvotes: 0
Views: 1932
Reputation: 414585
Python converts Unicode strings into bytes before printing. What you see is the correct output e.g., b'T' == b'\x54'
:
$ python -c"print u'TEXT'.encode('ascii')" | xxd
0000000: 5445 5854 0a TEXT.
Don't confuse Unicode string and a bytestring encoded in UTF-16 character encoding:
$ python -c"print u'TEXT'.encode('utf-16le')" | xxd
0000000: 5400 4500 5800 5400 0a T.E.X.T..
You could use PYTHONIOENCODING
environment variable to change the character encoding used to encode the output for the whole script:
$ PYTHONIOENCODING=utf-16le python -c"print u'TEXT'" | xxd
0000000: 5400 4500 5800 5400 0a T.E.X.T..
Upvotes: 1