Reputation: 125
I need some help parsing JSON file. I've tried a couple of different ways to get the data I need. Below is a sample of the code and also a section of the JSON data but when I run the code I get the error listed above.
There's 500K lines of text in the JSON and it first fails about about 1400 lines in and I can't see anything in that area section to indicate why.
I've run it successfully by only checking blocks of JSON up to the first 1400 lines and I've used a different parser and got the same error.
I'm debating if it's an error in the code, an error in the JSON or a result of the JSON being made of different kids of data as some (like the example below) is for a forklift and others for fixed machines but it is all structured just like below.
All help sincerely appreciate.
Code:
import json
file_list = ['filename.txt'] #insert filename(s) here
for x in range(len(file_list)):
with open(file_list[x], 'r') as f:
distros_dict = json.load(f)
#list the headlines to be parsed
for distro in distros_dict:
print(distro['name'], distro['positionTS'], distro['smoothedPosition'][0], distro['smoothedPosition'][1], distro['smoothedPosition'][2])
And here is a section of the JSON:
{
"id": "b4994c877c9c",
"name": "Trukki_0001",
"areaId": "Tracking001",
"areaName": "Ajoneuvo",
"color": "#FF0000",
"coordinateSystemId": "CoordSys001",
"coordinateSystemName": null,
"covarianceMatrix": [
0.47,
0.06,
0.06,
0.61
],
"position": [
33.86,
33.07,
2.15
],
"positionAccuracy": 0.36,
"positionTS": 1489363199493,
"smoothedPosition": [
33.96,
33.13,
2.15
],
"zones": [
{
"id": "Zone001",
"name": "Halli1"
}
],
"direction": [
0,
0,
0
],
"collisionId": null,
"restrictedArea": "",
"tagType": "VEHICLE_MANNED",
"drivenVehicleId": null,
"drivenByEmployeeIds": null,
"simpleXY": "33|33",
"EventProcessedUtcTime": "2017-03-13T00:00:00.3175072Z",
"PartitionId": 1,
"EventEnqueuedUtcTime": "2017-03-13T00:00:00.0470000Z"
}
Upvotes: 3
Views: 6095
Reputation: 125
The actual problem was that the JSON file was coded in UTF not ASCII. If you change the encoding using something like notepad++ then it will be solved.
Upvotes: 3
Reputation: 2289
I'm guessing that your JSON is actually a list of objects, i.e. the whole stream looks like:
[
{ x:1, y:2 },
{ x:3, y:4 },
...
]
... with each element being structured like the section you provided above. This is perfectly valid JSON, and if I store it in a file named file.txt
and paste your snippet between a set of [ ]
, thus making it a list, I can parse it in Python. Note, however, that the result will be again a Python list
, not a dict
, so you'd iterate like this over each list-item:
import json
import pprint
file_list = ['file.txt']
# Just iterate over the file-list like this, no need for range()
for x in file_list:
with open(x, 'r') as f:
# distros is a list!
distros = json.load(f)
for distro in distros:
print(distro['name'])
print(distro['positionTS'])
print(distro['smoothedPosition'][1])
pprint.pprint(distro)
Edit: I moved the second for-loop into the loop over the files. This seems to make more sense, as otherwise you'll iterate once over all files, store the last one in distros
, then print elements only from the last one. By nesting the loops, you'll iterate over all files, and for each file iterate over all elements in the list. Hat-tip to the commenters for pointing this out!
Upvotes: 0
Reputation: 309
Using the file provided I got it to work by changing "distros_dict" to a list. In you code you assign distros_dict not add to it, so if more than 1 file were to be read it would assign it to the last one.
This is my implementation
import json
file_list = ['filename.txt'] #insert filename(s) here
distros_list = []
for x in range(len(file_list)):
with open(file_list[x], 'r') as f:
distros_list.append(json.load(f))
#list the headlines to be parsed
for distro in distros_list:
print(distro['name'], distro['positionTS'], distro['smoothedPosition'][0], distro['smoothedPosition'][1], distro['smoothedPosition'][2])
You will be left with a list of dictionaries
Upvotes: 1