maxnormal
maxnormal

Reputation: 75

Remove json objects based on key value using Python

EDIT: Forgot to mention I am using Python 2.7

I have a large json file strctured like this:

[{
"headline": "Algérie Télécom prolonge son dispositif spécial Covid-19",
"url_src": "https://www.algerie360.com/algerie-telecom-prolonge-son-dispositif-special-covid-19/",
"img_src": "https://www.algerie360.com/wp-content/uploads/2020/04/DIA-Iddom-Algérie-télécom-320x200.jpg",
"news_src": "Algérie 360",
"catPT": "Ciência e Tecnologia",
"catFR": "Science et Technologie",
"catEN": "Science and Technology",
"lang": "French",
"epoch": 1591293345.817
},
{
"headline": "Internet haut débit à Alger : Lancement de la généralisation du  » fibre to home »",
"url_src": "https://www.algerie360.com/20200510-internet-haut-debit-a-alger-lancement-de-la-generalisation-du-fibre-to-home/",
"img_src": "https://www.algerie360.com/wp-content/uploads/2020/05/unnamed-320x200.jpg",
"news_src": "Algérie 360",
"catPT": "Ciência e Tecnologia",
"catFR": "Science et Technologie",
"catEN": "Science and Technology",
"lang": "French",
"epoch": 1591283345.817
},
...

I've been trying to write a .py script that opens my json file, removes all objects where the "epoch" key is less than 1591293345.817, and overwrites the current file.

Is this possible at all?

I've tried the following but my python knowledge is sketchy at best:

import time
import os
import json
import jsonlines

json_lines = []
with open('./json/news_done.json', 'r') as open_file:
    for line in open_file.readlines():
        j = json.loads(line)
        now = time.time()
        print(j['epoch'])
        lastWeek = now - 3600
        if not j['{epoch}'] > lastWeek:
            json_lines.append(line)

with open('./json/news_done.json', 'w') as open_file:
    open_file.writelines('\n'.join(json_lines))

Upvotes: 1

Views: 96

Answers (2)

Castlstream
Castlstream

Reputation: 21

Have you tried pandas framework? You can easily filter your columns with it.

I got this code snippet work with your example data:

import pandas as pd
import json

dataset = pd.read_json('example.json')
new_dataset = dataset[dataset['epoch'] >= 1591293345.817]
final_data = new_dataset.to_json(orient='records')

with open('example.json', 'w') as f:
    json.dump(final_data, f)

Upvotes: 2

Ramtin Nouri
Ramtin Nouri

Reputation: 320

Looks like you're only removing the "epoch" tag but if I've understood correctly you want to dismiss the whole element

you can open the file entirely as a json instead of lines individually

import json,time
with open('./json/news_done.json', 'r') as open_file:
    yourFileRead = open_file.read()
    yourJson = json.loads(yourFileRead)

filteredList = []
for j in yourJson: # j is one element out of the list not only one line
   if time.time()-3600 > j['epoch']:
       filteredList.append(j)

with open('./json/news_done.json', 'w') as open_file:
    open_file.write(json.dumps(filteredList))

Upvotes: 1

Related Questions