Emel
Emel

Reputation: 2442

Filtering a json and saving it to another file if it contains a certain word in Python

I have a json file that looks like this:

[
    {
        "id": 1,
        "first_name": "Clemens",
        "last_name": "Parramore",
        "email": "[email protected]",
        "gender": "Male",
        "ip_address": "223.150.139.137"
    },
    {
        "id": 1000,
        "first_name": "Theodore",
        "last_name": "Agostini",
        "email": "[email protected]",
        "gender": "Male",
        "ip_address": "6.131.228.196"
    }
]

And I am trying to filter it so that if it contains certain characters it is saved in a new json. My method does not give results, the new file contains all the information from the original json:

f = open('./data/data.json','r')
data = json.load(f)    
fj = open('new.json', 'w')
for line in str(data).split('{'):
    if "google" in line:
        print(line, end="\n")
        fj.write(line)

However in the print I do get what I want. I know the method is wrong, can someone help me? Thank you

Upvotes: 2

Views: 1056

Answers (2)

Hugo G
Hugo G

Reputation: 16494

This should get you started. Please leave a comment if you need something more specific.

import json

to_be_saved = []
with open("data/data.json") as raw_data:
  data = json.load(raw_data)
  for entry in data:
    if "google" in entry["email"]:
      to_be_saved.append(entry)

    # Alternatively, if you only want to check the email domain,
    # use this instead of the lines above:

    # if "google" in entry["email"].split("@")[1]:
    #   to_be_saved.append(entry)

# print(to_be_saved)

with open("result.json", "w") as output_data:
  output_data.write(json.dumps(to_be_saved, indent=2))

results.json file after running this script:

[
  {
    "id": 1,
    "first_name": "Clemens",
    "last_name": "Parramore",
    "email": "[email protected]",
    "gender": "Male",
    "ip_address": "223.150.139.137"
  }
]

Upvotes: 2

Barbaros Özhan
Barbaros Özhan

Reputation: 65218

You can use literal_eval function of ast in order to get raw data write to another file by using json.dumps in order to make the result pretty, after filtering out the mail values containing google library such as

import json
import ast

elm=[]
with open('data.json') as f, open('new.json', 'w') as f_out:
    data = ast.literal_eval(f.read())
    for i in range(0,len(data)):
        val = data[i]['email'].split('@')[1]
        if val[:val.find('.')]=='google': -- only mail addresses with "@google." syntax is kept within the result file,
                                          -- "[email protected]" is not kept as an example.
            elm.append(data[i])
    f_out.write(json.dumps(elm, indent=4))

Upvotes: 2

Related Questions