Reputation: 1730
Info about my df:
RangeIndex: 14151145 entries, 0 to 14151144
Data columns (total 4 columns):
id object
idf object
ch object
hr uint8
dtypes: object(3), uint8(1)
memory usage: 337.4+ MB
My system has 120GB memory and when I run:
dfp = df.pivot_table(index='id', columns=['idf','ch'],aggfunc='count')
My resultant pivot table will have 10800 columns.
My memory consumption goes to around 35 GB, and then I get a memory error. I can't understand this issue as I have a lot of free memory.
I am running the code in JupyterNotebook.
Upvotes: 3
Views: 1585
Reputation: 1730
I couldn't find anything which would help me process all of my data in one go.
So, sliced my df into n pieces w.r.t to ids, each id can have multiple samples.
def partition(lst, n):
division = len(lst) / float(n)
return [ lst[int(round(division * i)): int(round(division * (i +
1)))] for i in range(n) ]
chunks_df = pd.DataFrame()
ids = dt_m['id'].unique()
part_ids=partition(ids,5)
i=0
gc.collect()
for lst in part_ids:
chunks_df=chunks_df.append(dt_m[dt_m['id'].isin(lst)].PIVOT_OPERATION())
print("{} batch done".format(i))
i=i+1
Upvotes: 1