Reputation: 329
I ran into an issue when writing the header of a text file in python 3. I have a header that contains unicode AND new line characters. The following is a minimum working example:
with open('my_log.txt', 'wb') as my_file:
str_1 = '\u2588\u2588\u2588\u2588\u2588\n\u2588\u2588\u2588\u2588\u2588'
str_2 = 'regular ascii\nregular ascii'
my_file.write(str_1.encode('utf8'))
my_file.write(bytes(str_2, 'UTF-8'))
The above works, except the output file does not have the new lines (it basically looks like I replaced '\n' with ''). Like the following:
████████regular asciiregular ascii
I was expecting:
████
████
regular ascii
regular ascii
I have tried replacing '\n' with u'\u000A' and other characters based on similar questions - but I get the same result.
An additional, and probably related, question: I know I am making my life harder with the above encoding and byte methods. Still getting used to unicode in py3 so any advice regarding that would be great, thanks!
EDIT Based on Ignacio's response and some more research: The following seems to produce the desired results (basically converting from '\n' to '\r\n' and ensuring the encoding is correct on all the lines):
with open('my_log.txt', 'wb') as my_file:
str_1 = '\u2588\u2588\u2588\u2588\u2588\r\n\u2588\u2588\u2588\u2588\u2588'
str_2 = '\r\nregular ascii\r\nregular ascii'
my_file.write(str_1.encode('utf8'))
my_file.write(str_2.encode('utf8'))
Upvotes: 1
Views: 2671
Reputation: 177901
Since you mentioned wanting advice using Unicode on Python 3...
You are probably using Windows since the \n
isn't working correctly for you in binary mode. Linux uses \n
line endings for text, but Windows uses \r\n
.
Open the file in text mode and declare the encoding you want, then just write the Unicode strings. Below is an example that includes different escape codes for Unicode:
#coding:utf8
str_1 = '''\
\u2588\N{FULL BLOCK}\U00002588█
regular ascii'''
with open('my_log.txt', 'w', encoding='utf8') as my_file:
my_file.write(str_1)
You can use a four-digit escape \uxxxx
, eight-digit escape \Uxxxxxxxx
, or the Unicode codepoint \N{codepoint_name}
. The Unicode characters can also be directly used in the file as long as the #coding:
declaration is present and the source code file is saved in the declared encoding.
Note that the default source encoding for Python 3 is utf8
so the declaration I used above is optional, but on Python 2 the default is ascii
. The source encoding does not have to match the encoding used to open a file.
Use w
or wt
for writing text (t
is the default). On Windows \n
will translate to \r\n
in text mode.
Upvotes: 3
Reputation: 799082
'wb'
The file is open in binary mode. As such \n
isn't translated into the native newline format. If you open the file in a text editor that doesn't treat LF as a line break character then all the text will appear on a single line in the editor. Either open the file in text mode with an appropriate encoding or translate the newlines manually before writing.
Upvotes: 1