Reputation: 443
I have 2 below dataframes from 2 sources, the 3 white columns are indexes. These are from 2 reports about historical trades. the trades can only be compared when 3 columns "Trade date" "Exchange Instrument" and "Prompt date" are the same. "Trade date" is because they were report in chronological order. and the futures contracts are only the same when "Exchange Instrument" and "Prompt date" are the same.
I simply want to merge the 2 dfs so that there will be 7 columns with the same 3 indexes. it maybe challenging that the values are not unique for the same index on both reports: for example for the 1st August, CMX Cu contract with prompt 2020-03-01, there have been 3 trades and different prices:
I tried concat and merge, but never get a desired df.. for example while trying
df_complete= pd.concat([df_ctrm_timelined, df_broker_timelined],axis=1)
I get
ValueError: cannot handle a non-unique multi-index!
if you need the raw data, these are the first 10 rows of both dfs, the 2 dfs all have the same number of rows
df_broker_timelined[:10]
broker_Lots broker_TradePrice \
Trade Date Exchange Instrument PromptDate
2019-08-01 CMX Cu 2020-03-01 -1 2.6840
2020-03-01 -1 2.6865
2020-03-01 -2 2.6870
2019-09-01 1 2.6640
2019-09-01 1 2.6665
2019-09-01 2 2.6670
LME Al 2019-10-16 6 1777.5000
2019-11-01 -3 1779.0000
2019-11-01 -1 1779.0000
2019-11-01 -2 1779.0000
broker_Quantity
Trade Date Exchange Instrument PromptDate
2019-08-01 CMX Cu 2020-03-01 -25000
2020-03-01 -25000
2020-03-01 -50000
2019-09-01 25000
2019-09-01 25000
2019-09-01 50000
LME Al 2019-10-16 150
2019-11-01 -75
2019-11-01 -25
2019-11-01 -50
df_ctrm_timelined[:10]
ctrm_TradePrice ctrm_Lots \
Trade Date Exchange Instrument PromptDate
2019-08-01 CMX Cu 2019-09-30 2.6640 1
2019-09-30 2.6665 1
2019-09-30 2.6670 2
2020-03-31 2.6840 -1
2020-03-31 2.6865 -1
2020-03-31 2.6870 -2
LME Al 2019-10-16 1777.5000 6
2019-11-01 1792.5000 3
2019-11-01 1792.5000 3
2019-11-01 1781.5000 -6
ctrm_Quantity Strategy
Trade Date Exchange Instrument PromptDate
2019-08-01 CMX Cu 2019-09-30 25000 Strategy 1
2019-09-30 25000 Strategy 1
2019-09-30 50000 Strategy 1
2020-03-31 -25000 Strategy 1
2020-03-31 -25000 Strategy 1
2020-03-31 -50000 Strategy 1
LME Al 2019-10-16 150 Strategy 2
2019-11-01 75 Strategy 2
2019-11-01 75 Strategy 2
2019-11-01 -150 Strategy 2
Upvotes: 0
Views: 65
Reputation: 153460
Dealing with non-unique indexing:
from seaborn import load_dataset
#Create one dataframe with unique indexes, set multiindex
df = load_dataset('tips')
df = df.set_index(['day', 'time', 'sex', 'smoker'])
#Create a unique label per inner most index
df = df.set_index(df.groupby(level=[0,1,2,3]).cumcount(), append=True)
#Create second dataframe
df2 = df * 2
#Use join
df.join(df2, lsuffix='_2')
Output:
total_bill_2 tip_2 size_2 total_bill tip size
day time sex smoker
Sun Dinner Female No 0 16.99 1.01 2 33.98 2.02 4
Male No 0 10.34 1.66 3 20.68 3.32 6
1 21.01 3.50 3 42.02 7.00 6
2 23.68 3.31 2 47.36 6.62 4
Female No 1 24.59 3.61 4 49.18 7.22 8
... ... ... ... ... ... ...
Sat Dinner Male No 30 29.03 5.92 3 58.06 11.84 6
Female Yes 14 27.18 2.00 2 54.36 4.00 4
Male Yes 26 22.67 2.00 2 45.34 4.00 4
No 31 17.82 1.75 2 35.64 3.50 4
Thur Dinner Female No 0 18.78 3.00 2 37.56 6.00 4
[244 rows x 6 columns]
Here's an example using the "tips" dataset:
from seaborn import load_dataset
#Create one dataframe with unique indexes, set multiindex
df = load_dataset('tips')
df = df.set_index(['day', 'time', 'sex', 'smoker'])
df = df.groupby(level=[0,1,2,3]).first().dropna(how='all')
#Create second dataframe
df2 = df * 2
#Use join
df.join(df2, lsuffix='_2')
Output:
total_bill_2 tip_2 size_2 total_bill tip size
day time sex smoker
Thur Lunch Male Yes 19.44 3.00 2.0 38.88 6.00 4.0
No 27.20 4.00 4.0 54.40 8.00 8.0
Female Yes 19.81 4.19 2.0 39.62 8.38 4.0
No 10.07 1.83 1.0 20.14 3.66 2.0
Dinner Female No 18.78 3.00 2.0 37.56 6.00 4.0
Fri Lunch Male Yes 12.16 2.20 2.0 24.32 4.40 4.0
Female Yes 13.42 3.48 2.0 26.84 6.96 4.0
No 15.98 3.00 3.0 31.96 6.00 6.0
Dinner Male Yes 28.97 3.00 2.0 57.94 6.00 4.0
No 22.49 3.50 2.0 44.98 7.00 4.0
Female Yes 5.75 1.00 2.0 11.50 2.00 4.0
No 22.75 3.25 2.0 45.50 6.50 4.0
Sat Dinner Male Yes 38.01 3.00 4.0 76.02 6.00 8.0
No 20.65 3.35 3.0 41.30 6.70 6.0
Female Yes 3.07 1.00 1.0 6.14 2.00 2.0
No 20.29 2.75 2.0 40.58 5.50 4.0
Sun Dinner Male Yes 7.25 5.15 2.0 14.50 10.30 4.0
No 10.34 1.66 3.0 20.68 3.32 6.0
Female Yes 17.51 3.00 2.0 35.02 6.00 4.0
No 16.99 1.01 2.0 33.98 2.02 4.0
Upvotes: 2