Reputation: 783
I am working on a trading algo but I have some issues when trying to combine the buy_orders and sell_orders dataframes to a single dataframe, orders.
The issue shows it self on the buy_order date 2021-01-21 where a recommendation was made by my algo to buy, but this has no sell order yet as the signal hasn't been spotted yet, therefore these should be NAN when merged. If I was to join on index the sell order would be the sell order for a different stock from the sell_orders dataframe.
buy_orders dataframe
Date_buy Name Stock_Price_buy Order
26 2020-07-30 AAPL 96.19 buy
27 2020-09-30 AAPL 115.81 buy
28 2020-11-05 AAPL 119.03 buy
29 2020-11-30 AAPL 119.05 buy
30 2021-01-21 AAPL 136.87 buy
31 2020-10-11 ABBV 21.21 buy
sell_orders dataframe
Date_sell Name Stock_Price_sell Order
25 2020-07-20 AAPL 98.36 sell
26 2020-09-02 AAPL 131.40 sell
27 2020-10-20 AAPL 117.51 sell
28 2020-11-20 AAPL 117.34 sell
29 2021-01-04 AAPL 129.41 sell
30 2020-10-15 ABBV 24.23 sell
Ideal result would be the orders dataframe as demonstrated below.
Index Buy_date Name_x Stock_Price_buy Order_x Sell_date Name_y Stock_Price_buy Order_y
26 2020-07-30 AAPL 96.19 buy 2020-09-02 AAPL 131.40 sell
27 2020-09-30 AAPL 115.81 buy 2020-10-20 AAPL 117.51 sell
28 2020-11-05 AAPL 119.03 buy 2020-11-20 AAPL 117.34 sell
29 2020-11-30 AAPL 119.05 buy 2021-01-04 AAPL 129.41 sell
30 2021-01-21 AAPL 136.87 buy NaN NaN NaN NaN
Here's how the orders dataframe looks like now when buy_orders.Name_x and sell_orders.Name_y are different for the first time. ABBV sell_order should have been NANs
28 2020-11-05 AAPL 119.03 buy 2020-11-20 AAPL 117.34 sell
29 2020-11-30 AAPL 119.05 buy 2021-01-04 AAPL 129.41 sell
30 2021-01-21 AAPL 136.87 buy 2018-05-24 ABBV 24.23 sell
Upvotes: 1
Views: 191
Reputation: 1012
Assuming your data is structured so that it always starts with a buy and alternates with sell orders, and only has one transaction per day, and is always one sized lot per transaction... you can use pd.concat
. I made a simple dataframe that is kind of like yours (and in the future, it makes it easier if you include code to make a sample dataframe as part of your question):
buy_orders = pd.DataFrame.from_dict({'Date_buy': [ pd.to_datetime('2020-11-01'), pd.to_datetime('2020-11-03'),
pd.to_datetime('2020-11-05'), pd.to_datetime('2020-11-08'),
pd.to_datetime('2020-11-10')],
'Order' : ['B','B','B','B','B'],
'Name' : ['AAPL','AAPL','AAPL','AAPL','ABBV'],
'Stock_Price_buy' : [1,2,3,4,5.0]})
sell_orders = pd.DataFrame.from_dict({'Date_sell': [ pd.to_datetime('2020-11-02'), pd.to_datetime('2020-11-04'),
pd.to_datetime('2020-11-06'), pd.to_datetime('2020-11-12'),
pd.to_datetime('2020-11-22')],
'Order' : ['S','S','S','S','S'],
'Name' : ['AAPL','AAPL','AAPL','ABBV','ABBV'],
'Stock_Price_sell' : [23,24,25,26,5.0]})
You can first stack the two dataframes and sort them by date and ticker (after normalizing the column names):
buy_orders = buy_orders.rename(columns={'Date_buy' : "Date", "Stock_Price_buy" : "Price"})
sell_orders = sell_orders.rename(columns={'Date_sell' : "Date", "Stock_Price_sell" : "Price"})
df = pd.concat([buy_orders, sell_orders])
df = df.sort_values(['Date','Order']).reset_index(drop=True)
...then making a copy of the dataframe (changing the column names to keep them distinct in the later concat step):
df2 = df.copy()
df2.columns = [f"{c}_sell" for c in df.columns]
You then concatenate the two dataframes next to each other, but with a .shift(-1)
on the second one so that they're offset
df3 = pd.concat([df, df2.shift(-1)], axis=1)
Finally, you can clean up the junk rows:
cut = ( df3.Name != df3.Name_sell)
import numpy as np
df3.loc[cut, 'Date_sell'] = np.nan
df3.loc[cut, 'Order_sell'] = np.nan
df3.loc[cut, 'Price_sell'] = np.nan
df3 = df3.drop(columns='Name_sell')
df3 = df3[df3.Order!="S"].reset_index(drop=True).copy()
That gives you something like
Date Order Name Price Date_sell Order_sell Price_sell
0 2020-11-01 B AAPL 1.0 2020-11-02 S 23.0
1 2020-11-03 B AAPL 2.0 2020-11-04 S 24.0
2 2020-11-05 B AAPL 3.0 2020-11-06 S 25.0
3 2020-11-08 B AAPL 4.0 NaT NaN NaN
4 2020-11-10 B ABBV 5.0 2020-11-12 S 26.0
You don't have to make all the intermediate dataframes, etc, but I left the code that way here so that if you paste things in a notebook you can look at the steps.
Upvotes: 0
Reputation: 26676
Have you thought of join but then declaring suffixes as follows?.
buy_orders.join(sell_orders,lsuffix='_buy', rsuffix='_sell')
Date_buy Name_buy Stock_Price_buy Order_buy Date_sell Name_sell \
26 2020-07-30 AAPL 96.19 buy 2020-09-02 AAPL
27 2020-09-30 AAPL 115.81 buy 2020-10-20 AAPL
28 2020-11-05 AAPL 119.03 buy 2020-11-20 AAPL
29 2020-11-30 AAPL 119.05 buy 2021-01-04 AAPL
30 2021-01-21 AAPL 136.87 buy NaN NaN
Stock_Price_sell Order_sell
26 131.40 sell
27 117.51 sell
28 117.34 sell
29 129.41 sell
30 NaN NaN
Upvotes: 2