Reputation: 39
So, I have the following two dataframes and my ideal output is to get open_orders reduced by cancel_orders so I know how many open_orders I have.
Desired Output:
df_total_orders
order_id business_symbol open_orders
0 a1b2c3111111 AA 0.0
1 4kl3l2242244 AAA 0.0
2 21312a224144 BBB 0.0
3 124f32rv2323 CCC 0.0
4 2412dfe21edf DDD 0.0
5555
What I have now:
df_add_orders (this dataframe contains duplicates in the order id column)
order_id business_symbol open_orders
0 a1b2c3111111 AA 100.0
1 4kl3l2242244 AAA 1000.0
2 21312a224144 BBB 1000.0
3 124f32rv2323 CCC 1000.0
4 2412dfe21edf DDD 1000.0
10000
df_cancel_orders (this dataframe contains duplicates in the order id column)
order_id cancel_orders
0 a1b2c3111111 100.0
1 4kl3l2242244 1000.0
2 21312a224144 1000.0
3 124f32rv2323 1000.0
4 2412dfe21edf 1000.0
4000
I am using the following group by to get the total open orders, but it's removing my symbols with the output.
df_add_orders_group = df_add_orders.groupby(['order_id'], as_index=False)['open_orders'].sum()
df_add_orders_group
order_id open_orders
0 a1b2c3111111 110.0
1 4kl3l2242244 1200.0
2 21312a224144 1500.0
3 124f32rv2323 1500.0
4 2412dfe21edf 1500.0
5000
I am then subtracting closed orders
df_cancel_orders_group = df_cancel_orders.groupby(['order_id'], as_index=False)['cancel_orders'].sum()
but I want to keep the symbols so I can compare the total open orders and I need some way to merge the result of the subtraction with the main df_add_orders and also clean up order_id to sum duplicates.
Upvotes: 0
Views: 223
Reputation: 3591
df_total_orders =df_add_orders.merge(df_cancel_orders,
how = 'left',
on = 'order_id)
will get you a dataframe with the data from the two original dataframes. You can then do
df_total_orders['open_orders'] =
df_total_orders['open_orders']-
df_total_orders['cancel_orders']
and then drop the cancel_order
column.
Another tactic would be to append the two original dataframes together. You can then do groupby over id and sum. This will create a dataframe that has, for each id, one column that is the sum of all open orders for that id, and another column that is the sum of all cancel orders for that id. You can then take the difference between the two columns.
When you append the two dataframes, you'll get a bunch of nulls; the portion from the open orders will have nulls in the cancel order column, and vice versa. I think summing over the groupby will just treat the nulls as zeros, but you might need to explicitly tell it to do so.
Upvotes: 1
Reputation: 23146
Try with pandas.merge
and groupby
:
merged = pd.merge(df_add_orders.groupby("order_id", as_index=False) \
.agg({"business_symbol": "first",
"open_orders": "sum"}),
df_cancel_orders.groupby("order_id", as_index=False).sum(),
on="order_id")
output = merged.assign(open_orders=merged["open_orders"]-merged["cancel_orders"]) \
.drop("cancel_orders", axis=1)
>>> output
order_id business_symbol open_orders
0 124f32rv2323 CCC 0.0
1 21312a224144 BBB 0.0
2 2412dfe21edf DDD 0.0
3 4kl3l2242244 AAA 0.0
4 a1b2c3111111 AA 0.0
Upvotes: 0