lpd
lpd

Reputation: 51

How to solve memory error when loading 1GB large json files in python?

I'm trying to convert json files to csv but getting memory error. Is there any efficient way to fine tune this code to process large json files in python.

def change(row, pastkeys=()):
result = {}
c=0
for key in row:
    c=c+1
    newkey = pastkeys + (key,)
    print key
    val = row[key]
    if isinstance(val, dict):
        result.update(change(val, newkey))
    elif isinstance(val, list):
        result.update(change(dict(zip(range(0, len(val)), val)), newkey))
    else:
        result[newkey] = val
return result
a=open(sys.argv[1],'r')
lines=list(a)
 print lines
out1=open(sys.argv[2],'w')
try:
  data = json.loads(''.join(lines))
  if isinstance(data, dict):
    data = [data]
  except ValueError:
    data = [json.loads(line) for line in lines]
 result = []
 fields = set()
 for row in data:
    hash = change(row)
    fields |= set(hash.keys()
    result.append(hash)
out1=open(sys.argv[2],'w+')
fields = sorted(fields)
out = csv.writer(out1,lineterminator='\n')
out.writerow(['-'.join([str(f) for f in field]) for field in fields])
for row in result:
out.writerow([(row.get(field,'')) for field  in fields ])

a.close()

Upvotes: 1

Views: 6169

Answers (2)

fpilee
fpilee

Reputation: 2098

You are loading the entire content of your file in one list (lines) and storing the results in another list (result).

Do not load the entire content of your file in memory unless you need some advantage, like the speed of access (ram vs hdd).

Instead you could process one line at the time, reading it, processing it and appending in your file.

Upvotes: 0

salomonderossi
salomonderossi

Reputation: 2188

You could try to use ijson. It is a module that will work with JSON as a stream, rather than a block file. ijson is to JSON what SAX is to XML.

import ijson
for prefix, theType, value in ijson.parse(open(jsonFileName)):
    print prefix, theType, value

Upvotes: 5

Related Questions