Reputation: 1124
I am trying to parse a JSON multiline file using json
library in Python 2.7. A simplified sample file is given below:
{
"observations": {
"notice": [
{
"copyright": "Copyright Commonwealth of Australia 2015, Bureau of Meteorology. For more information see: http://www.bom.gov.au/other/copyright.shtml http://www.bom.gov.au/other/disclaimer.shtml",
"copyright_url": "http://www.bom.gov.au/other/copyright.shtml",
"disclaimer_url": "http://www.bom.gov.au/other/disclaimer.shtml",
"feedback_url": "http://www.bom.gov.au/other/feedback"
}
]
}
}
My code is as follows:
import json
with open('test.json', 'r') as jsonFile:
for jf in jsonFile:
jf = jf.replace('\n', '')
jf = jf.strip()
weatherData = json.loads(jf)
print weatherData
Nevertheless, I get an error as shown below:
Traceback (most recent call last):
File "test.py", line 8, in <module>
weatherData = json.loads(jf)
File "/home/usr/anaconda2/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/home/usr/anaconda2/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/usr/anaconda2/lib/python2.7/json/decoder.py", line 380, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting object: line 1 column 1 (char 0)
Just to do some testing, I modified the code such that after removing newlines and striping away the leading and trailing white spaces, I write the contents to another file (with the json
extension). Surprisingly, when I read back the latter file, I do not get any error and the parsing is successful. The modified code is as follows:
import json
filewrite = open('out.json', 'w+')
with open('test.json', 'r') as jsonFile:
for jf in jsonFile:
jf = jf.replace('\n', '')
jf = jf.strip()
filewrite.write(jf)
filewrite.close()
with open('out.json', 'r') as newJsonFile:
for line in newJsonFile:
weatherData = json.loads(line)
print weatherData
The output is as follows:
{u'observations': {u'notice': [{u'copyright_url': u'http://www.bom.gov.au/other/copyright.shtml', u'disclaimer_url': u'http://www.bom.gov.au/other/disclaimer.shtml', u'copyright': u'Copyright Commonwealth of Australia 2015, Bureau of Meteorology. For more information see: http://www.bom.gov.au/other/copyright.shtml http://www.bom.gov.au/other/disclaimer.shtml', u'feedback_url': u'http://www.bom.gov.au/other/feedback'}]}}
Any idea what might be going on when new lines and white spaces are stripped before using json
library?
Upvotes: 7
Views: 27703
Reputation: 5108
You will go crazy if you try to parse a json file line by line. The json module has helper methods to read file objects directly or strings i.e. the load
and loads
methods. load
takes a file object (as shown below) for a file that contains json data, while loads
takes a string that contains json data.
Option 1: - Preferred
import json
with open('test.json', 'r') as jf:
weatherData = json.load(jf)
print weatherData
Option 2:
import json
with open('test.json', 'r') as jf:
weatherData = json.loads(jf.read())
print weatherData
If you are looking for higher performance json parsing check out ujson
Upvotes: 10
Reputation: 477
FYI, you can have both files opened in single with
statement:
with open('file_A') as in_, open('file_B', 'w+') as out_:
# logic here
...
Upvotes: 2
Reputation: 3535
In the first snippet, you try to parse it line by line. You should parse it all at once. The easiest is to use json.load(jsonfile)
. (The jf variable name is misleading as it's a string). So the correct way to parse it:
import json
with open('test.json', 'r') as jsonFile:
weatherData = json.loads(jsonFile)
Although it's a good idea to store the json in one line, as it's more concise.
In the second snippet your problem is that you print it as unicode string which is and u'string here'
is python specific. A valid json uses double quotation marks
Upvotes: 6