minTwin
minTwin

Reputation: 1393

How to parse for specific unique values from a JSON lines file with Python and store into an array

The program needs to parse through a JSON lines file and store the data into an array. The only data that actually needs to be stored in the array is any value that comes after "SRC/Word1".

Here is an example of JSON lines file:

{"Event UTC": "2020-12-21 05:23:06", "Event Time": "00:23:06:94", "SRC/Word1": " ", "Word2": " ", "Word3": " "}
{"Event UTC": "2020-12-21 05:30:53", "Event Time": "00:30:53:95", "SRC/Word1": "E1F25701", "Word2": "A29C7E68", "Word3": " "}
{"Event UTC": "2020-12-21 05:31:04", "Event Time": "00:31:04:34", "SRC/Word1": "E1F25701", "Word2": "D529F3D7", "Word3": " "}
{"Event UTC": "2020-12-21 10:18:54", "Event Time": "05:18:54:45", "SRC/Word1": "E15511D7", "Word2": "1F6FC55C", "Word3": " "}

Here is the code I have so far:

import json

data = []
with open('stela_zerrl_t01_201222_084053_test.json') as fin:
    for line in fin:
        data.append(json.loads(line))
        print(data)

The data array would contain something like data = [E1F25701, E15511D7]

Any idea of how to accomplish this?

Upvotes: 0

Views: 45

Answers (2)

balderman
balderman

Reputation: 23825

see below (data represents the lines that were loaded from the file)

data = [{"Event UTC": "2020-12-21 05:23:06", "Event Time": "00:23:06:94", "SRC/Word1": " ", "Word2": " ", "Word3": " "},
        {"Event UTC": "2020-12-21 05:30:53", "Event Time": "00:30:53:95", "SRC/Word1": "E1F25701", "Word2": "A29C7E68",
         "Word3": " "},
        {"Event UTC": "2020-12-21 05:31:04", "Event Time": "00:31:04:34", "SRC/Word1": "E1F25701", "Word2": "D529F3D7",
         "Word3": " "},
        {"Event UTC": "2020-12-21 10:18:54", "Event Time": "05:18:54:45", "SRC/Word1": "E15511D7", "Word2": "1F6FC55C",
         "Word3": " "}]
data_sub_set = list(set(x["SRC/Word1"] for x in data if x["SRC/Word1"].strip()))
print(data_sub_set)

output

['E1F25701', 'E15511D7']

Upvotes: 1

Mars Buttfield-Addison
Mars Buttfield-Addison

Reputation: 181

A JSON object just needs to be accessed like a dictionary. If you're looking for the SRC/Word1 field, then you ask for that:

import json

data = []
with open('stela_zerrl_t01_201222_084053_test.json') as fin:
    for line in fin:
        data.append(json.loads(line)['SRC/Word1']) # not field access here
        print(data)

but you may want to omit empty string entries or do some error handling if the json doesn't always have that field.

EDIT: just saw your "skip duplicates and omit empties" comment.

import json

data = []
with open('stela_zerrl_t01_201222_084053_test.json') as fin:
    for line in fin:
        value = json.loads(line).get('SRC/Word1', '')
        # check not all spaces and also not already present in array
        if not value.isspace() and value not in data:
            data.append(value)
            print(data)

Upvotes: 1

Related Questions