How to split one big json file into smaller multiple files?

Question

I have a huge json file with below contents in it. To make it simpler, I removed bunch of things to make it easier to understand. I have a templates json array which has bunch of json objects in it.

{
  "templates": [
    {
     "clientId": 1234,
     "key1": "value1",
     ...
    },
    {
     "clientId": 9876,
     "key2": "value2",
     ...
    },
    {
     "textGroup": 87,
     "key3": "value3",
     ...
    },
    {
     "textGroup": 90,
     "key4": "value4",
     ...
    }
  ]
}

Now I want to read this one big json file and split them into smaller multiple json files. Total number of smaller files will be total number of json objects inside templates. As you can see I have two clientId json object and two textGroup json object so total number of smaller json files generated will be 4.

Each json object is a different file name with content as value of json object. Below is how each file should be generated.

File name should be:

processConfig-client-1234.json

Content in that file:

{
  "templates": [
    {
     "clientId": 1234,
     "key1": "value1",
     ...
    }
   ]
}

similarly for others:

processConfig-client-9876.json

{
  "templates": [
    {
     "clientId": 9876,
     "key2": "value2",
     ...
    }
   ]
}

processConfig-textGroup-87.json

{
  "templates": [
    {
     "textGroup": 87,
     "key3": "value3",
     ...
    }
   ]
}

processConfig-textGroup-90.json

{
  "templates": [
    {
     "textGroup": 90,
     "key4": "value4",
     ...
    }
   ]
}

File name will always have processConfig- followed by either client- or textGroup- and then followed by the value of that key.

Is this possible to do by any chance?

Błotosmętek · Accepted Answer

import json
dct = json.load('inputfile.json')
for subdct in dct['templates']:
    if "textGroup" in subdct:
        fname = "processConfig-textGroup-{}.json".format(subdct["textGroup"])
    elif "clientId" in subdct:
        fname = "processConfig-client-{}.json".format(subdct["clientId"])
    with open(fname, 'w') as f:
        json.dump({'templates': [subdct]}, f)

Of course, if your file is really huge (too big to fit into memory) this won't work.

How to split one big json file into smaller multiple files?

Answers (1)

Related Questions