Reputation: 35
I am trying to write some Data to a file. In some instances, obviously depending on the Data I am trying to write, I get a UnicodeEncodeError (UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f622' in position 141: character maps to ) I did some research and found out that I can encode the data I am writing with the encode function.
This is the code prior to modifying it (not supporting Unicode):
scriptDir = os.path.dirname(__file__)
path = os.path.join(scriptDir, filename)
with open(path, 'w') as fp:
for sentence in iobTriplets:
fp.write("\n".join("{} {} {}".format(triplet[0],triplet[1],triplet[2]) for triplet in sentence))
fp.write("\n")
fp.write("\n")
So I though maybe I could just add the encoding when writing like that:
fp.write("\n".join("{} {} {}".format(triplet[0],triplet[1],triplet[2]).encode('utf8') for triplet in sentence))
But that doesn't work as I am getting the following error: TypeError: sequence item 0: expected str instance, bytes found
I also tried opening the file in byte mode with adding a b behind the w. However that didn't yield any results.
Does anybody know how to fix this? Btw: I am using python 3.
Upvotes: 0
Views: 482
Reputation: 183
You have already opened the file with automatic encoding. There is no need to manually encode anything unless you are writing to binary.
You can specify any supported encoding in open()
:
with open(path, 'w', encoding='utf-16be') as fp:
Unless the file is opened as binary, you need to remove the str.encode()
in the fp.write()
:
fp.write("\n".join("{} {} {}".format(triplet[0],triplet[1],triplet[2]) for triplet in sentence))
Upvotes: 1