Sebastian Jose
Sebastian Jose

Reputation: 309

How to work with a DataFrame which cannot be transformed by pandas pivot due to excessive memory usage?

I have dataframe with this structure

enter image description here

i built this dfp with 100 rows of the original for testing

enter image description here

and then i tried to make a pivot operation to get a dataframe like this

enter image description here

The problem with the pivot operations using all data is that the solution would have 131209 rows and 39123 columns. When I try the operation the memory collapse and restar my pc.

I tried segmenting the dataframe with 10 or 20. The pivot works but when I do a concat operation it crashes the memory again.

enter image description here

My pc have 16gb of memory. I have also tried with collabs but it also collapses the memory.

Is there a format or another strategy to work on this operation?

Upvotes: 0

Views: 69

Answers (1)

Baron Legendre
Baron Legendre

Reputation: 2188

You may try this,

dfu = dfp.groupby(['order_id','product_id'])[['my_column']].sum().unstack().fillna(0)

Another way is you split product_id to process and concatinate back to ,

front_part = []
rear_part = []
dfp_f = dfp[dfp['product_id'].isin(front_part)]
dfp_r = dfp[dfp['product_id'].isin(rear_part)]

dfs_f = dfp_f.pivot(index='order_id', columns='product_id', values=['my_column']).fillna(0)
dfs_r = dfp_r.pivot(index='order_id', columns='product_id', values=['my_column']).fillna(0)
dfs = pd.concat([dfs_f, dfs_r], axis=1)

front_part, rear_part means we wanna separate product_id into two parts, but we need to specify the discrete numerical value into lists.

Upvotes: 1

Related Questions