Reputation: 1393
The program needs to parse through a JSON lines file and store the data into an array. The only data that actually needs to be stored in the array is any value that comes after "SRC/Word1".
Here is an example of JSON lines file:
{"Event UTC": "2020-12-21 05:23:06", "Event Time": "00:23:06:94", "SRC/Word1": " ", "Word2": " ", "Word3": " "}
{"Event UTC": "2020-12-21 05:30:53", "Event Time": "00:30:53:95", "SRC/Word1": "E1F25701", "Word2": "A29C7E68", "Word3": " "}
{"Event UTC": "2020-12-21 05:31:04", "Event Time": "00:31:04:34", "SRC/Word1": "E1F25701", "Word2": "D529F3D7", "Word3": " "}
{"Event UTC": "2020-12-21 10:18:54", "Event Time": "05:18:54:45", "SRC/Word1": "E15511D7", "Word2": "1F6FC55C", "Word3": " "}
Here is the code I have so far:
import json
data = []
with open('stela_zerrl_t01_201222_084053_test.json') as fin:
for line in fin:
data.append(json.loads(line))
print(data)
The data array would contain something like data = [E1F25701, E15511D7]
Any idea of how to accomplish this?
Upvotes: 0
Views: 45
Reputation: 23825
see below (data
represents the lines that were loaded from the file)
data = [{"Event UTC": "2020-12-21 05:23:06", "Event Time": "00:23:06:94", "SRC/Word1": " ", "Word2": " ", "Word3": " "},
{"Event UTC": "2020-12-21 05:30:53", "Event Time": "00:30:53:95", "SRC/Word1": "E1F25701", "Word2": "A29C7E68",
"Word3": " "},
{"Event UTC": "2020-12-21 05:31:04", "Event Time": "00:31:04:34", "SRC/Word1": "E1F25701", "Word2": "D529F3D7",
"Word3": " "},
{"Event UTC": "2020-12-21 10:18:54", "Event Time": "05:18:54:45", "SRC/Word1": "E15511D7", "Word2": "1F6FC55C",
"Word3": " "}]
data_sub_set = list(set(x["SRC/Word1"] for x in data if x["SRC/Word1"].strip()))
print(data_sub_set)
output
['E1F25701', 'E15511D7']
Upvotes: 1
Reputation: 181
A JSON object just needs to be accessed like a dictionary. If you're looking for the SRC/Word1
field, then you ask for that:
import json
data = []
with open('stela_zerrl_t01_201222_084053_test.json') as fin:
for line in fin:
data.append(json.loads(line)['SRC/Word1']) # not field access here
print(data)
but you may want to omit empty string entries or do some error handling if the json doesn't always have that field.
EDIT: just saw your "skip duplicates and omit empties" comment.
import json
data = []
with open('stela_zerrl_t01_201222_084053_test.json') as fin:
for line in fin:
value = json.loads(line).get('SRC/Word1', '')
# check not all spaces and also not already present in array
if not value.isspace() and value not in data:
data.append(value)
print(data)
Upvotes: 1