Deekshith Hari
Deekshith Hari

Reputation: 50

python pandas: add missing date after groupby and calculate the quantity

I have a dataframe like below

   stock__stock__FINCODE  quantity   price transcation_type       month
0                 100112    1000.0  2000.0              buy  2022-06-01
1                 100112     500.0  2000.0              buy  2022-06-01
2                 100112    1200.0  3000.0             sell  2022-08-01
3                 100112     100.0  3000.0             sell  2022-08-01
4                 100112     200.0  3000.0             sell  2022-08-01
5                 100325     100.0  1500.0              buy  2022-06-01

I did groupby with stock__stock__FINCODE, transcation_type and month, and made sum of quantity. The code and result is as below

cols = ['stock__stock__FINCODE', 'month', 'transcation_type']
df2 = df.groupby(cols)[['quantity']].sum().add_suffix('_sum')
                                                   quantity_sum
stock__stock__FINCODE month      transcation_type              
100112                2022-06-01 buy                     1500.0
                      2022-08-01 sell                    1500.0
100325                2022-06-01 buy                      100.0

but I want the result on each month and if a sell happens in same month then need to substract with the buy. If any month is not available then add that month and make the quantity same as previous month.

the result should be some thing like below.

                                             
stock__stock__FINCODE 2022-06-01 2022-07-01 2022-08-01        
100112                1500       1500       0
100325                100        100        100

the date column is basically monthly quantity. It should start from the smallest date to till now.

can someone please help me with this?

Upvotes: 0

Views: 61

Answers (1)

sayan dasgupta
sayan dasgupta

Reputation: 1082

Here you go. You will need to create a daterange index

date_range_index = pd.date_range(start=df.month.min(),end=df.month.max(), freq='MS')
date_range_index_df = pd.DataFrame(index=date_range_index)

Unstack df2 to create 2 columns one for buy one for sell

cols = ['stock__stock__FINCODE', 'month', 'transcation_type']
df2 = df.groupby(cols)[['quantity']].sum().add_suffix('_sum')
df2_unstack = df2.unstack(level='transcation_type').reset_index()
df2_unstack.columns = df2_unstack.columns.map('_'.join)
df2_unstack = df2_unstack.set_index(pd.to_datetime(df2_unstack.month_)).fillna(0)
df2_unstack

Use pandas reindex to create a function and get the cumulative

#df2.sort_index()
def apply_on_each_stock(gdf: pd.DataFrame):
    gdf_up = gdf.reindex(date_range_index,fill_value=0).sort_index(level='month_')
    gdf_up['cumbuy'] = gdf_up['quantity_sum_buy'].cumsum()
    gdf_up['cumsell'] = gdf_up['quantity_sum_sell'].cumsum()
    gdf_up['qty_rem'] = gdf_up['cumbuy'] - gdf_up['cumsell']
    return(gdf_up)

df2_ops = df2_unstack.groupby(['stock__stock__FINCODE_']).apply(
    lambda row: apply_on_each_stock(row)
)

And then you reshape as you want

df2_ops['qty_rem'].unstack(level=1)


2022-06-01  2022-07-01  2022-08-01
stock__stock__FINCODE_          
100112  1500.0  1500.0  0.0
100325  100.0   100.0   100.0

Upvotes: 1

Related Questions