Reputation: 3529
I am trying to merge two MultiIndex'ed dataframes. My code is below. The issue, as you can see in the output, is that the "DATE" index is repeated, whereas I'd like all the values (OPEN_INT, PX_LAST) to be on the same date index... any ideas? I've tried both append, and concat but both give me similar results.
if df.empty:
df = bbg_historicaldata(t, f, startDate, endDate)
datesArray = list(df.index)
tArray = [t for i in range(len(datesArray))]
arrays = [tArray, datesArray]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['TICKER', 'DATE'])
df = pd.DataFrame({f : df[f].values}, index=index)
else:
temp = bbg_historicaldata(t,f,startDate,endDate)
datesArray = list(temp.index)
tArray = [t for i in range(len(datesArray))]
arrays = [tArray, datesArray]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['TICKER', 'DATE'])
temp = pd.DataFrame({f : temp[f].values}, index=index)
#df = df.append(temp, ignore_index = True)
df = pd.concat([df, temp]).sortlevel()
And the results:
OPEN_INT PX_LAST
TICKER DATE
EDH8 COMDTY 2017-02-01 NaN 98.365
2017-02-01 1008044.0 NaN
2017-02-02 NaN 98.370
2017-02-02 1009994.0 NaN
2017-02-03 NaN 98.360
2017-02-03 1019181.0 NaN
2017-02-06 NaN 98.405
2017-02-06 1023863.0 NaN
2017-02-07 NaN 98.410
2017-02-07 1024609.0 NaN
2017-02-08 NaN 98.435
2017-02-08 1046258.0 NaN
2017-02-09 NaN 98.395
Essentially want to get it so there are no NaNs!
EDIT: Adding "axis = 1" to the concat resulted in the following (my fault for not including the additional output in the firstplace)
PX_LAST OPEN_INT PX_LAST OPEN_INT PX_LAST \
TICKER DATE
EDH8 COMDTY 2017-02-01 98.365 1008044.0 NaN NaN NaN
2017-02-02 98.370 1009994.0 NaN NaN NaN
2017-02-03 98.360 1019181.0 NaN NaN NaN
2017-02-06 98.405 1023863.0 NaN NaN NaN
2017-02-07 98.410 1024609.0 NaN NaN NaN
2017-02-08 98.435 1046258.0 NaN NaN NaN
2017-02-09 98.395 1050291.0 NaN NaN NaN
EDM8 COMDTY 2017-02-01 NaN NaN 98.245 726739.0 NaN
2017-02-02 NaN NaN 98.250 715081.0 NaN
2017-02-03 NaN NaN 98.235 723936.0 NaN
2017-02-06 NaN NaN 98.285 729324.0 NaN
2017-02-07 NaN NaN 98.295 728673.0 NaN
2017-02-08 NaN NaN 98.325 728520.0 NaN
2017-02-09 NaN NaN 98.280 741840.0 NaN
EDU8 COMDTY 2017-02-01 NaN NaN NaN NaN 98.130
2017-02-02 NaN NaN NaN NaN 98.135
2017-02-03 NaN NaN NaN NaN 98.120
2017-02-06 NaN NaN NaN NaN 98.180
2017-02-07 NaN NaN NaN NaN 98.190
2017-02-08 NaN NaN NaN NaN 98.225
2017-02-09 NaN NaN NaN NaN 98.175
Thanks!
Upvotes: 2
Views: 83
Reputation: 15240
You need to concatenate along the other axis:
pd.concat([df, temp], axis=1)
By default, Pandas concatenates rows and aligns columns, which lead to the result you saw.
Upvotes: 1
Reputation: 51
It is not clear what the format of input is.
I assumed OPEN_INT looks like this:
import datetime
import pandas as pd
open_int = pd.DataFrame(
[
(datetime.date(2017, 2, 1), 1008044.0),
(datetime.date(2017, 2, 2), 1009994.0),
(datetime.date(2017, 2, 3), 1019181.0),
(datetime.date(2017, 2, 6), 1023863.0),
(datetime.date(2017, 2, 7), 1024609.0),
(datetime.date(2017, 2, 8), 1046258.0),
],
columns=['DATE', 'OPEN_INT']
)
open_int['TICKER'] = 'EDH8 COMDTY'
open_int.set_index(['TICKER', 'DATE'], inplace=True)
print(open_int)
# OPEN_INT
# TICKER DATE
# EDH8 COMDTY 2017-02-01 1008044.0
# 2017-02-02 1009994.0
# 2017-02-03 1019181.0
# 2017-02-06 1023863.0
# 2017-02-07 1024609.0
# 2017-02-08 1046258.0
And PX_LAST looks like this:
px_last = pd.DataFrame(
[
(datetime.date(2017, 2, 1), 98.365),
(datetime.date(2017, 2, 2), 98.370),
(datetime.date(2017, 2, 3), 98.360),
(datetime.date(2017, 2, 6), 98.405),
(datetime.date(2017, 2, 7), 98.410),
(datetime.date(2017, 2, 8), 98.435),
(datetime.date(2017, 2, 9), 98.395),
],
columns=['DATE', 'PX_LAST']
)
px_last['TICKER'] = 'EDH8 COMDTY'
px_last.set_index(['TICKER', 'DATE'], inplace=True)
print(px_last)
# PX_LAST
# TICKER DATE
# EDH8 COMDTY 2017-02-01 98.365
# 2017-02-02 98.370
# 2017-02-03 98.360
# 2017-02-06 98.405
# 2017-02-07 98.410
# 2017-02-08 98.435
# 2017-02-09 98.395
Then you concat them and get what you want:
df = pd.concat([open_int, px_last], axis=1)
print(df)
# OPEN_INT PX_LAST
# TICKER DATE
# EDH8 COMDTY 2017-02-01 1008044.0 98.365
# 2017-02-02 1009994.0 98.370
# 2017-02-03 1019181.0 98.360
# 2017-02-06 1023863.0 98.405
# 2017-02-07 1024609.0 98.410
# 2017-02-08 1046258.0 98.435
# 2017-02-09 NaN 98.395
Upvotes: 1