keynesiancross
keynesiancross

Reputation: 3529

Python Pandas - Issue appending / concat two multi-indexed Dataframes

I am trying to merge two MultiIndex'ed dataframes. My code is below. The issue, as you can see in the output, is that the "DATE" index is repeated, whereas I'd like all the values (OPEN_INT, PX_LAST) to be on the same date index... any ideas? I've tried both append, and concat but both give me similar results.

            if df.empty:
                df = bbg_historicaldata(t, f, startDate, endDate)
                datesArray = list(df.index)
                tArray = [t for i in range(len(datesArray))]
                arrays = [tArray, datesArray]
                tuples = list(zip(*arrays))
                index = pd.MultiIndex.from_tuples(tuples, names=['TICKER', 'DATE'])                    
                df = pd.DataFrame({f : df[f].values}, index=index)

            else:
                temp = bbg_historicaldata(t,f,startDate,endDate)
                datesArray = list(temp.index)
                tArray = [t for i in range(len(datesArray))]
                arrays = [tArray, datesArray]
                tuples = list(zip(*arrays))
                index = pd.MultiIndex.from_tuples(tuples, names=['TICKER', 'DATE'])


                temp = pd.DataFrame({f : temp[f].values}, index=index)

                #df = df.append(temp, ignore_index = True)
                df = pd.concat([df, temp]).sortlevel()

And the results:

                        OPEN_INT  PX_LAST
TICKER      DATE                          
EDH8 COMDTY 2017-02-01        NaN   98.365
            2017-02-01  1008044.0      NaN
            2017-02-02        NaN   98.370
            2017-02-02  1009994.0      NaN
            2017-02-03        NaN   98.360
            2017-02-03  1019181.0      NaN
            2017-02-06        NaN   98.405
            2017-02-06  1023863.0      NaN
            2017-02-07        NaN   98.410
            2017-02-07  1024609.0      NaN
            2017-02-08        NaN   98.435
            2017-02-08  1046258.0      NaN
            2017-02-09        NaN   98.395

Essentially want to get it so there are no NaNs!

EDIT: Adding "axis = 1" to the concat resulted in the following (my fault for not including the additional output in the firstplace)

                        PX_LAST   OPEN_INT  PX_LAST  OPEN_INT  PX_LAST  \
TICKER      DATE                                                         
EDH8 COMDTY 2017-02-01   98.365  1008044.0      NaN       NaN      NaN   
            2017-02-02   98.370  1009994.0      NaN       NaN      NaN   
            2017-02-03   98.360  1019181.0      NaN       NaN      NaN   
            2017-02-06   98.405  1023863.0      NaN       NaN      NaN   
            2017-02-07   98.410  1024609.0      NaN       NaN      NaN   
            2017-02-08   98.435  1046258.0      NaN       NaN      NaN   
            2017-02-09   98.395  1050291.0      NaN       NaN      NaN   
EDM8 COMDTY 2017-02-01      NaN        NaN   98.245  726739.0      NaN   
            2017-02-02      NaN        NaN   98.250  715081.0      NaN   
            2017-02-03      NaN        NaN   98.235  723936.0      NaN   
            2017-02-06      NaN        NaN   98.285  729324.0      NaN   
            2017-02-07      NaN        NaN   98.295  728673.0      NaN   
            2017-02-08      NaN        NaN   98.325  728520.0      NaN   
            2017-02-09      NaN        NaN   98.280  741840.0      NaN   
EDU8 COMDTY 2017-02-01      NaN        NaN      NaN       NaN   98.130   
            2017-02-02      NaN        NaN      NaN       NaN   98.135   
            2017-02-03      NaN        NaN      NaN       NaN   98.120   
            2017-02-06      NaN        NaN      NaN       NaN   98.180   
            2017-02-07      NaN        NaN      NaN       NaN   98.190   
            2017-02-08      NaN        NaN      NaN       NaN   98.225   
            2017-02-09      NaN        NaN      NaN       NaN   98.175  

Thanks!

Upvotes: 2

Views: 83

Answers (2)

Igor Raush
Igor Raush

Reputation: 15240

You need to concatenate along the other axis:

pd.concat([df, temp], axis=1)

By default, Pandas concatenates rows and aligns columns, which lead to the result you saw.

Upvotes: 1

littlefighter
littlefighter

Reputation: 51

It is not clear what the format of input is.

I assumed OPEN_INT looks like this:

import datetime
import pandas as pd


open_int = pd.DataFrame(
    [
        (datetime.date(2017, 2, 1), 1008044.0),
        (datetime.date(2017, 2, 2), 1009994.0),
        (datetime.date(2017, 2, 3), 1019181.0),
        (datetime.date(2017, 2, 6), 1023863.0),
        (datetime.date(2017, 2, 7), 1024609.0),
        (datetime.date(2017, 2, 8), 1046258.0),
    ],
    columns=['DATE', 'OPEN_INT']
)
open_int['TICKER'] = 'EDH8 COMDTY'
open_int.set_index(['TICKER', 'DATE'], inplace=True)

print(open_int)
#                          OPEN_INT
# TICKER      DATE
# EDH8 COMDTY 2017-02-01  1008044.0
#             2017-02-02  1009994.0
#             2017-02-03  1019181.0
#             2017-02-06  1023863.0
#             2017-02-07  1024609.0
#             2017-02-08  1046258.0

And PX_LAST looks like this:

px_last = pd.DataFrame(
    [
        (datetime.date(2017, 2, 1), 98.365),
        (datetime.date(2017, 2, 2), 98.370),
        (datetime.date(2017, 2, 3), 98.360),
        (datetime.date(2017, 2, 6), 98.405),
        (datetime.date(2017, 2, 7), 98.410),
        (datetime.date(2017, 2, 8), 98.435),
        (datetime.date(2017, 2, 9), 98.395),

    ],
    columns=['DATE', 'PX_LAST']
)
px_last['TICKER'] = 'EDH8 COMDTY'
px_last.set_index(['TICKER', 'DATE'], inplace=True)

print(px_last)
#                         PX_LAST
# TICKER      DATE
# EDH8 COMDTY 2017-02-01   98.365
#             2017-02-02   98.370
#             2017-02-03   98.360
#             2017-02-06   98.405
#             2017-02-07   98.410
#             2017-02-08   98.435
#             2017-02-09   98.395

Then you concat them and get what you want:

df = pd.concat([open_int, px_last], axis=1)
print(df)
#                          OPEN_INT  PX_LAST
# TICKER      DATE
# EDH8 COMDTY 2017-02-01  1008044.0   98.365
#             2017-02-02  1009994.0   98.370
#             2017-02-03  1019181.0   98.360
#             2017-02-06  1023863.0   98.405
#             2017-02-07  1024609.0   98.410
#             2017-02-08  1046258.0   98.435
#             2017-02-09        NaN   98.395

Upvotes: 1

Related Questions