plakias
plakias

Reputation: 137

Python write valid json with newlines to file

Valid json expects escaped newline characters to be encoded as '\\n', with two backslashes. I have data that contains newline characters that I want to save to a file. Here's a simplified version:

data = {'mystring': 'Line 1\nLine 2'}

I can encode it with json.dumps():

import json
json_data = json.dumps(data)
json_data
# -> '{"mystring": "Line 1\\nLine 2"}'

When I print it, the newline displays as '\n', not '\\n' (which I find odd but I can live with):

print(json_data)
# -> {"mystring": "Line 1\nLine 2"}

However (here's the problem) when I output it to a file, the content of the file no longer contains valid json:

f = open('mydata.json', 'w')
f.write(json_data)
f.close()

If I open the file and read it, it contains this:

{"mystring": "Line 1\nLine 2"}

but I was hoping for this:

{"mystring": "Line 1\\nLine 2"}

Oddly (I think), if I read the file using python's open(), the json data is considered valid:

f = open('mydata.json', 'r')
json_data = f.read()
f.close()
json_data
# -> '{"mystring": "Line 1\\nLine 2"}'

... and it decodes OK:

json.loads(json_data)
# -> {u'mystring': u'Line 1\nLine 2'}

My question is why is the data in the file not valid json? If I need another - non Python - application to read it it would probably be incorrect. If I copy and paste the file contents and use json.loads() on it it fails:

import json
json.loads('{"mystring": "Line 1\nLine 2"}')
# -> ValueError: Invalid control character at: line 1 column 21 (char 20)

Can anybody explain if this is the expected behaviour or am I doing something wrong?

Upvotes: 10

Views: 27747

Answers (3)

Joseph Marinier
Joseph Marinier

Reputation: 1675

This is not an answer to the OP's question but to my question which led me here:

How do you load (arguably invalid) JSON with newlines within strings?

Use the strict=False option, available in json.load(), json.loads() or JSONDecoder().

For example:

json.loads('{"mystring": "Line 1\nLine 2"}', strict=False)
# -> {'mystring': 'Line 1\nLine 2'}

Here is the documentation for JSONDecoder:

If strict is false (True is the default), then control characters will be allowed inside strings. Control characters in this context are those with character codes in the 0-31 range, including '\t' (tab), '\n', '\r' and '\0'.

Upvotes: 0

Radu Diță
Radu Diță

Reputation: 14171

The reason for this:

print(json_data)
# -> {"mystring": "Line 1\nLine 2"}

Is that \\ is a valid escape sequence that ends up as a single backslash \ when trying to print it.

The data in the json file is valid, as the parser is able to parse it :)

The confusion stems from the fact that when you try to print a string with escape sequences those get interpreted. And the sequence \\n is interpreted as \n

Upvotes: 1

metatoaster
metatoaster

Reputation: 18898

You ran into the pitfall of neglecting the fact that the \ character is also an escape sequence character in Python. Try printing out the last example instead of calling json.loads:

>>> print('{"mystring": "Line 1\nLine 2"}')
{"mystring": "Line 1
Line 2"}

No way the above is valid JSON. What if the \ character is correctly encoded?

>>> print('{"mystring": "Line 1\\nLine 2"}')
{"mystring": "Line 1\nLine 2"}

Much better, you can then:

>>> json.loads('{"mystring": "Line 1\\nLine 2"}')
{'mystring': 'Line 1\nLine 2'}

Alternatively, if you really appreciate being able to copy some text from some other buffer and paste it into your live interpreter to do decode, you may consider using the raw modifier for your string:

>>> print(r'{"mystring": "Line 1\nLine 2"}')
{"mystring": "Line 1\nLine 2"}
>>> json.loads(r'{"mystring": "Line 1\nLine 2"}')
{'mystring': 'Line 1\nLine 2'}

See that the \ is no longer automatically escaping with the newline.

Also see: How do I handle newlines in JSON? and note how this is not a problem that exists strictly within Python.

Upvotes: 7

Related Questions