Reputation: 993
I´m trying to do the simplest thing, open a file, read and close it in python. Simple. Well this is the code:
name_file = open("Forever.txt", encoding='UTF-8')
data = name_file.read()
name_file.close()
print (data)
I know that this texts has emojis in it like hearts, etc. The thing is that this emojis are not in there unicode syntax like U+2600 , they are placed as little images. I think the following error is because of this little images:
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f681' in
position 2333: character maps to <undefined>
I tried the following, without specifyng encoding:
name_file = open("Forever.txt")
And the error changed to this:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 2303: character maps to <undefined>
No idea why is this happening.
Maybe one solution would be to save in a variable everything that is test and deleting the rest...mmm.
Upvotes: 2
Views: 8735
Reputation: 177901
You are getting a UnicodeEncodeError
, likely from your print
statement. The file is being read and interpreted correctly, but you can only print characters that your console encoding and font actually support. The error indicates the character isn't supported in the current encoding.
For example:
Python 3.3.5 (v3.3.5:62cf4e77f785, Mar 9 2014, 10:35:05) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print('\U0001F681')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\\Python33\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f681' in position 0: character maps to <undefined>
But print a character the terminal encoding supports, and it works:
>>> print('\U000000E0')
à
My console encoding was cp437
, but if I use a Python IDE that supports UTF-8 encoding, then it works:
>>> print('\U0001f681')
🚁
You may or may not see the character correctly. You need to be using a font that supports the character; otherwise, you get some default replacement character.
Upvotes: 6
Reputation: 602175
Without seeing your input file, it's hard to guess what encoding it's actually in. A text file containing "little images" isn't a meaningful description of the file format, though my guess is that your file actually is UTF-8 encoded, since opening it with that encoding works. Printing the data fails because the codec of your stdout (likely the codec of your terminal) isn't able to encode the emoji. You could try explicitly encoding in UTF-8, if your terminal supports that encoding:
sys.stdout.buffer.write(data.encode('utf-8'))
If your terminal doesn't support a codec that is able to display the emoji, then this is an inherent limitation of your terminal, and there is nothing you can do about it in the Python code.
Upvotes: 3