Reputation: 1946
I have a very large JSON object that I need to split into smaller objects and write those smaller objects to file.
Sample Data
raw = '[{"id":"1","num":"2182","count":-17}{"id":"111","num":"3182","count":-202}{"id":"222","num":"4182","count":12},{"id":"33333","num":"5182","count":12}]'
Desired Output (In this example, split the data in half)
output_file1.json = [{"id":"1","num":"2182","count":-17},{"id":"111","num":"3182","count":-202}]
output_file2.json = [{"id":"222","num":"4182","count":12}{"id":"33333","num":"5182","count":12}]
Current Code
import pandas as pd
import itertools
import json
from itertools import zip_longest
def grouper(iterable, n, fillvalue=None):
args = [iter(iterable)] * n
return zip_longest(fillvalue=fillvalue, *args)
raw = '[{"id":"1","num":"2182","count":-17}{"id":"111","num":"3182","count":-202}{"id":"222","num":"4182","count":12},{"id":"33333","num":"5182","count":12}]'
#split the data into manageable chunks + write to files
for i, group in enumerate(grouper(raw, 4)):
with open('outputbatch_{}.json'.format(i), 'w') as outputfile:
json.dump(list(group), outputfile)
Current Output of first file "outputbatch_0.json"
["[", "{", "\"", "s"]
I feel like I'm making this much harder than it needs to be.
Upvotes: 1
Views: 2047
Reputation: 2781
If you need the exactly half of the data you can use slicing:
import json
raw = '[{"id":"1","num":"2182","count":-17},{"id":"111","num":"3182","count":-202},{"id":"222","num":"4182","count":12},{"id":"33333","num":"5182","count":12}]'
json_data = json.loads(raw)
size_of_half = len(json_data)/2
print json_data[:size_of_half]
print json_data[size_of_half:]
In shared code basic cases are not handled like what if length is odd etc, In short You can do everything that you can do with list.
Upvotes: 0
Reputation: 323
if raw is valid json. the saving part is not detailed.
import json
raw = '[{"id":"1","num":"2182","count":-17},{"id":"111","num":"3182","count":-202},{"id":"222","num":"4182","count":12},{"id":"33333","num":"5182","count":12}]'
raw_list = eval(raw)
raw__zipped = list(zip(raw_list[0::2], raw_list[1::2]))
for item in raw__zipped:
with open('a.json', 'w') as f:
json.dump(item, f)
Upvotes: 0
Reputation: 91
assuming the raw should be a valid json string (I included the missing commas), here is a simple, but working solution.
import json
raw = '[{"id":"1","num":"2182","count":-17},{"id":"111","num":"3182","count":-202},{"id":"222","num":"4182","count":12},{"id":"33333","num":"5182","count":12}]'
json_data = json.loads(raw)
def split_in_files(json_data, amount):
step = len(json_data) // amount
pos = 0
for i in range(amount - 1):
with open('output_file{}.json'.format(i+1), 'w') as file:
json.dump(json_data[pos:pos+step], file)
pos += step
# last one
with open('output_file{}.json'.format(amount), 'w') as file:
json.dump(json_data[pos:], file)
split_in_files(json_data, 2)
Upvotes: 2