Cees Timmerman
Cees Timmerman

Reputation: 19674

Printing escaped Unicode in Python

>>> s = 'auszuschließen'
>>> print(s.encode('ascii', errors='xmlcharrefreplace'))
b'auszuschließen'
>>> print(str(s.encode('ascii', errors='xmlcharrefreplace'), 'ascii'))
auszuschließen

Is there a prettier way to print any string without the b''?

EDIT:

I'm just trying to print escaped characters from Python, and my only gripe is that Python adds "b''" when i do that.

If i wanted to see the actual character in a dumb terminal like Windows 7's, then i get this:

Traceback (most recent call last):
  File "Mailgen.py", line 378, in <module>
    marked_copy = mark_markup(language_column, item_row)
  File "Mailgen.py", line 210, in mark_markup
    print("TP: %r" % "".join(to_print))
  File "c:\python32\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2026' in position 29: character maps to <undefined>

Upvotes: 5

Views: 6732

Answers (4)

Mark Ransom
Mark Ransom

Reputation: 308520

>>> s='auszuschließen…'
>>> s
'auszuschließen…'
>>> print(s)
auszuschließen…
>>> b=s.encode('ascii','xmlcharrefreplace')
>>> b
b'auszuschlie&#223;en&#8230;'
>>> print(b)
b'auszuschlie&#223;en&#8230;'
>>> b.decode()
'auszuschlie&#223;en&#8230;'
>>> print(b.decode())
auszuschlie&#223;en&#8230;

You start out with a Unicode string. Encoding it to ascii creates a bytes object with the characters you want. Python won't print it without converting it back into a string and the default conversion puts in the b and quotes. Using decode explicitly converts it back to a string; the default encoding is utf-8, and since your bytes only consist of ascii which is a subset of utf-8 it is guaranteed to work.

Upvotes: 3

jfs
jfs

Reputation: 414795

To see ascii representation (like repr() on Python 2) for debugging:

print(ascii('auszuschließen…'))
# -> 'auszuschlie\xdfen\u2026'

To print bytes:

sys.stdout.buffer.write('auszuschließen…'.encode('ascii', 'xmlcharrefreplace'))
# -> auszuschlie&#223;en&#8230;

Upvotes: 5

Lennart Regebro
Lennart Regebro

Reputation: 172367

Not all terminals can handle more than some sort of 8-bit character set, that's true. But they won't handle that no matter what you do, really.

Printing a Unicode string will, assuming that your OS set's up the terminal properly, result in the best result possible, which means that the characters that the terminal can not print will be replaced with some character, like a question mark or similar. Doing that translation yourself will not really improve things.

Update:

Since you want to know what characters are in the string, you actually want to know the Unicode codes for them, or the XML equivalent in this case. That's more inspecting than printing, and then usually the b'' part isn't a problem per se.

But you can get rid of it easily and hackily like so:

print(repr(s.encode('ascii', errors='xmlcharrefreplace'))[2:-1])

Upvotes: 1

Makoto
Makoto

Reputation: 106498

Since you're using Python 3, you're afforded the ability to write print(s) to the console.

I can agree that, depending on the console, it may not be able to print properly, but I would imagine that most modern OSes since 2006 can handle Unicode strings without too much of an issue. I'd encourage you to give it a try and see if it works.

Alternatively, you can enforce a coding by placing this before any lines in a file (similar to a shebang):

# -*- coding: utf-8 -*-

This will force the interpreter to render it as UTF-8.

Upvotes: 0

Related Questions