Reputation: 137
Valid json expects escaped newline characters to be encoded as '\\n', with two backslashes. I have data that contains newline characters that I want to save to a file. Here's a simplified version:
data = {'mystring': 'Line 1\nLine 2'}
I can encode it with json.dumps():
import json
json_data = json.dumps(data)
json_data
# -> '{"mystring": "Line 1\\nLine 2"}'
When I print it, the newline displays as '\n', not '\\n' (which I find odd but I can live with):
print(json_data)
# -> {"mystring": "Line 1\nLine 2"}
However (here's the problem) when I output it to a file, the content of the file no longer contains valid json:
f = open('mydata.json', 'w')
f.write(json_data)
f.close()
If I open the file and read it, it contains this:
{"mystring": "Line 1\nLine 2"}
but I was hoping for this:
{"mystring": "Line 1\\nLine 2"}
Oddly (I think), if I read the file using python's open(), the json data is considered valid:
f = open('mydata.json', 'r')
json_data = f.read()
f.close()
json_data
# -> '{"mystring": "Line 1\\nLine 2"}'
... and it decodes OK:
json.loads(json_data)
# -> {u'mystring': u'Line 1\nLine 2'}
My question is why is the data in the file not valid json? If I need another - non Python - application to read it it would probably be incorrect. If I copy and paste the file contents and use json.loads() on it it fails:
import json
json.loads('{"mystring": "Line 1\nLine 2"}')
# -> ValueError: Invalid control character at: line 1 column 21 (char 20)
Can anybody explain if this is the expected behaviour or am I doing something wrong?
Upvotes: 10
Views: 27747
Reputation: 1675
This is not an answer to the OP's question but to my question which led me here:
How do you load (arguably invalid) JSON with newlines within strings?
Use the strict=False
option, available in json.load()
, json.loads()
or JSONDecoder()
.
For example:
json.loads('{"mystring": "Line 1\nLine 2"}', strict=False)
# -> {'mystring': 'Line 1\nLine 2'}
Here is the documentation for JSONDecoder
:
If
strict
is false (True
is the default), then control characters will be allowed inside strings. Control characters in this context are those with character codes in the 0-31 range, including'\t'
(tab),'\n'
,'\r'
and'\0'
.
Upvotes: 0
Reputation: 14171
The reason for this:
print(json_data)
# -> {"mystring": "Line 1\nLine 2"}
Is that \\
is a valid escape sequence that ends up as a single backslash \
when trying to print it.
The data in the json file is valid, as the parser is able to parse it :)
The confusion stems from the fact that when you try to print a string with escape sequences those get interpreted. And the sequence \\n
is interpreted as \n
Upvotes: 1
Reputation: 18898
You ran into the pitfall of neglecting the fact that the \
character is also an escape sequence character in Python. Try printing out the last example instead of calling json.loads
:
>>> print('{"mystring": "Line 1\nLine 2"}')
{"mystring": "Line 1
Line 2"}
No way the above is valid JSON. What if the \
character is correctly encoded?
>>> print('{"mystring": "Line 1\\nLine 2"}')
{"mystring": "Line 1\nLine 2"}
Much better, you can then:
>>> json.loads('{"mystring": "Line 1\\nLine 2"}')
{'mystring': 'Line 1\nLine 2'}
Alternatively, if you really appreciate being able to copy some text from some other buffer and paste it into your live interpreter to do decode, you may consider using the r
aw modifier for your string:
>>> print(r'{"mystring": "Line 1\nLine 2"}')
{"mystring": "Line 1\nLine 2"}
>>> json.loads(r'{"mystring": "Line 1\nLine 2"}')
{'mystring': 'Line 1\nLine 2'}
See that the \
is no longer automatically escaping with the newline.
Also see: How do I handle newlines in JSON? and note how this is not a problem that exists strictly within Python.
Upvotes: 7