Reputation: 309
I have dataframe with this structure
i built this dfp with 100 rows of the original for testing
and then i tried to make a pivot operation to get a dataframe like this
The problem with the pivot operations using all data is that the solution would have 131209 rows and 39123 columns. When I try the operation the memory collapse and restar my pc.
I tried segmenting the dataframe with 10 or 20. The pivot works but when I do a concat operation it crashes the memory again.
My pc have 16gb of memory. I have also tried with collabs but it also collapses the memory.
Is there a format or another strategy to work on this operation?
Upvotes: 0
Views: 69
Reputation: 2188
You may try this,
dfu = dfp.groupby(['order_id','product_id'])[['my_column']].sum().unstack().fillna(0)
Another way is you split product_id
to process and concatinate back to ,
front_part = []
rear_part = []
dfp_f = dfp[dfp['product_id'].isin(front_part)]
dfp_r = dfp[dfp['product_id'].isin(rear_part)]
dfs_f = dfp_f.pivot(index='order_id', columns='product_id', values=['my_column']).fillna(0)
dfs_r = dfp_r.pivot(index='order_id', columns='product_id', values=['my_column']).fillna(0)
dfs = pd.concat([dfs_f, dfs_r], axis=1)
front_part
, rear_part
means we wanna separate product_id
into two parts, but we need to specify the discrete numerical value into lists.
Upvotes: 1