Reputation: 21

Python - Unicode file IO

I have a one line txt file with a bunch of unicode characters with no spaces

example

🅾🆖🆕Ⓜ🆙🆚🈁🈂

And I want to output a txt file with one character on each line

When I try to do this I think end up splitting the unicode charachters, how can I go about this?

Upvotes: 2

Answers (2)

wim

Reputation: 362707

There is no such thing as a text file with a bunch of unicode characters, it only makes sense to speak about a "unicode object" once the file has been read and decoded into Python objects. The data in the text file is encoded, one way or another.

So, the problem is about reading the file in the correct way in order to decode the characters to unicode objects correctly.

import io
enc_source = enc_target = 'utf-8'
with io.open('my_file.txt', encoding=enc_source) as f:
    the_line = f.read().strip()
with io.open('output.txt', mode='w', encoding=enc_target) as f:
    f.writelines([c + '\n' for c in the_line])

Above I am assuming the target and source file encodings are both utf-8. This is not necessarily the case, and you should know what the source file is encoded with. You get to choose enc_target, but somebody has to tell you enc_source (the file itself can't tell you).

Upvotes: 3

ForceBru

Reputation: 44838

This works in Python 3.5

line = "😀👍"
with open("file.txt", "w", encoding="utf8") as f:
    f.write("\n".join(line))

Upvotes: -1

Python - Unicode file IO

Answers (2)

Related Questions