Yannis
Yannis

Reputation: 1711

Python handling newline and tab characters when writing to file

I am writing some text (which includes \n and \t characters) taken from one source file onto a (text) file ; for example:

source file (test.cpp):

/*
 * test.cpp
 *
 *    2013.02.30
 *
 */

is taken from the source file and stored in a string variable like so

test_str = "/*\n test.cpp\n *\n *\n *\n\t2013.02.30\n *\n */\n"

which when I write onto a file using

    with open(test.cpp, 'a') as out:
        print(test_str, file=out)

is being written with the newline and tab characters converted to new lines and tab spaces (exactly like test.cpp had them) whereas I want them to remain \n and \t exactly like the test_str variable holds them in the first place.

Is there a way to achieve that in Python when writing to a file these 'special characters' without them being translated?

Upvotes: 3

Views: 10949

Answers (4)

Hamza Abbad
Hamza Abbad

Reputation: 666

Using str.encode with 'unicode_escape' as indicated in Jon Clements answer is not a good solution because it escapes all Unicode characters, which gives bad results when used with anything other than English:

>>> t = 'English text\tTexte en Français\nنص بالعربية\t中文文本\n'
>>> t
'English text\tTexte en Français\nنص بالعربية\t中文文本\n'
>>> t.encode('unicode_escape').decode('utf-8')
'English text\\tTexte en Fran\\xe7ais\\n\\u0646\\u0635 \\u0628\\u0627\\u0644\\u0639\\u0631\\u0628\\u064a\\u0629\\t\\u4e2d\\u6587\\u6587\\u672c\\n'

As you can see, the display of anything other than ASCII has been transformed into escape characters, which is not the expected behaviour. But you see that the Python console does not have this problem and displays non-ASCII characters perfectly.

To achieve something similar to what the Python console does, use the following code:

>>> repr(t).strip("'")
'English text\\tTexte en Français\\nنص بالعربية\\t中文文本\\n'

repr(t) does everything cleanly except that it adds single quote marks around the text, so we remove them using .strip("'").

Upvotes: 0

jfs
jfs

Reputation: 414169

I want them to remain \n and \t exactly like the test_str variable holds them in the first place.

test_str does NOT contain the backslash \ + t (two characters). It contains a single character ord('\t') == 9 (the same character as in the test.cpp). Backslash is special in Python string literals e.g., u'\U0001f600' is NOT ten characters—it is a single character 😀 Don't confuse a string object in memory during runtime and its text representation as a string literal in Python source code.

JSON could be a better alternative than unicode-escape encoding to store text (more portable) i.e., use:

import json

with open('test.json', 'w') as file:
    json.dump({'test.cpp': test_str}, file)

instead of test_str.encode('unicode_escape').decode('ascii').

To read json back:

with open('test.json') as file:
    test_str = json.load(file)['test.cpp']

Upvotes: 1

Jon Clements
Jon Clements

Reputation: 142136

You can use str.encode:

with open('test.cpp', 'a') as out:
    print(test_str.encode('unicode_escape').decode('utf-8'), file=out)

This'll escape all the Python recognised special escape characters.

Given your example:

>>> test_str = "/*\n test.cpp\n *\n *\n *\n\t2013.02.30\n *\n */\n"
>>> test_str.encode('unicode_escape')
b'/*\\n test.cpp\\n *\\n *\\n *\\n\\t2013.02.30\\n *\\n */\\n'

Upvotes: 2

quapka
quapka

Reputation: 2929

Use replace(). And since you need to use it multiple times, you might want to look at this.

test_str = "/*\n test.cpp\n *\n *\n *\n\t2013.02.30\n *\n */\n"
with open("somefile", "w") as f:
    test_str = test_str.replace('\n','\\n')
    test_str = test_str.replace('\t','\\t')
    f.write(test_str)

Upvotes: 2

Related Questions