Python memory allocation with pandas and pickle

Question

I am running a python script which can be roughly summed (semi-psuedo-coded) as follows:

import pandas as pd
for json_file in json_files:
    with open(json_file,'r') as fin:
        data = fin.readlines()
    data_str = '[' + ','.join(x.strip() for x in data) + ']'
    df = pd.read_json(data_str)
    df.to_pickle('%s.pickle' % json_file)
    del df, data, data_str

The process works iteratively creating data frames, saving them each to a unique file. However, my memory seems to get used up during the process, as if del df, data, data_str does not free up memory (originally, I did not include the del statement in the code, but I hoped that adding it would resolve the issue -- it did not). During each iteration, approximately the same amount of data is being read into the data frame, approximately 3% of my available memory; as the process iterates, each iteration a there is a reported 3% bump in %MEM (from ps u | grep [p]ython in my terminal), and eventually my memory is swamped and the process is killed. My question is how should I change my code/approach so that at each iteration, the memory from the previous iteration is freed?

To note, I'm running Ubuntu 16.04 with Python 3.5.2 via Anaconda.

Thanks in advance for your direction.

Python memory allocation with pandas and pickle

Answers (1)

Related Questions