Reputation: 19674
>>> s = 'auszuschließen'
>>> print(s.encode('ascii', errors='xmlcharrefreplace'))
b'auszuschließen'
>>> print(str(s.encode('ascii', errors='xmlcharrefreplace'), 'ascii'))
auszuschließen
Is there a prettier way to print any string without the b''
?
EDIT:
I'm just trying to print escaped characters from Python, and my only gripe is that Python adds "b''" when i do that.
If i wanted to see the actual character in a dumb terminal like Windows 7's, then i get this:
Traceback (most recent call last):
File "Mailgen.py", line 378, in <module>
marked_copy = mark_markup(language_column, item_row)
File "Mailgen.py", line 210, in mark_markup
print("TP: %r" % "".join(to_print))
File "c:\python32\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2026' in position 29: character maps to <undefined>
Upvotes: 5
Views: 6732
Reputation: 308520
>>> s='auszuschließen…'
>>> s
'auszuschließen…'
>>> print(s)
auszuschließen…
>>> b=s.encode('ascii','xmlcharrefreplace')
>>> b
b'auszuschließen…'
>>> print(b)
b'auszuschließen…'
>>> b.decode()
'auszuschließen…'
>>> print(b.decode())
auszuschließen…
You start out with a Unicode string. Encoding it to ascii
creates a bytes
object with the characters you want. Python won't print it without converting it back into a string and the default conversion puts in the b
and quotes. Using decode
explicitly converts it back to a string; the default encoding is utf-8
, and since your bytes
only consist of ascii
which is a subset of utf-8
it is guaranteed to work.
Upvotes: 3
Reputation: 414795
To see ascii representation (like repr()
on Python 2) for debugging:
print(ascii('auszuschließen…'))
# -> 'auszuschlie\xdfen\u2026'
To print bytes:
sys.stdout.buffer.write('auszuschließen…'.encode('ascii', 'xmlcharrefreplace'))
# -> auszuschließen…
Upvotes: 5
Reputation: 172367
Not all terminals can handle more than some sort of 8-bit character set, that's true. But they won't handle that no matter what you do, really.
Printing a Unicode string will, assuming that your OS set's up the terminal properly, result in the best result possible, which means that the characters that the terminal can not print will be replaced with some character, like a question mark or similar. Doing that translation yourself will not really improve things.
Update:
Since you want to know what characters are in the string, you actually want to know the Unicode codes for them, or the XML equivalent in this case. That's more inspecting than printing, and then usually the b'' part isn't a problem per se.
But you can get rid of it easily and hackily like so:
print(repr(s.encode('ascii', errors='xmlcharrefreplace'))[2:-1])
Upvotes: 1
Reputation: 106498
Since you're using Python 3, you're afforded the ability to write print(s)
to the console.
I can agree that, depending on the console, it may not be able to print properly, but I would imagine that most modern OSes since 2006 can handle Unicode strings without too much of an issue. I'd encourage you to give it a try and see if it works.
Alternatively, you can enforce a coding by placing this before any lines in a file (similar to a shebang):
# -*- coding: utf-8 -*-
This will force the interpreter to render it as UTF-8.
Upvotes: 0