mmaarrttoonn
mmaarrttoonn

Reputation: 119

Pandas query multiindex dataframe based on another single index dataframe

I have two dataframes: Data & Positions.

Data has multiindex: 'Date' and 'Symbol'. Positions has a single index: 'Date' (and has a column with the label 'Symbol'). Both 'Date'-s are DatetimeIndexes.

I want to fetch the 'Open' value from Data to Positions for all the dates (indexes) in Positions.

If I try the following:

positions['Open_price'] = data.loc['1997-02-10','AA'].Open

I get the 'Open' value correctly for the 'AA' Symbol on 1997-02-10 for the 'Open_price' field for the whole Positions dataframe. However, my goal is to get 'Open_price' for the relevant Date (index) and Symbol (column value).

So I try this:

positions['Open_price'] = data.loc[positions.index,positions.Symbol].Open

Then I get an error:

KeyError: "None of [DatetimeIndex(['1999-01-01',  ...\n  '2018-07-30'],\n  
        dtype='datetime64[ns]', length=7151, freq='D')] 
are in the [index]"

NOTE: It is important that Data's 'Date' index has missing values (weekends). Positions' 'Date' index has no missing dates.

How can I make this work?

IMPORTANT:

Accepted answer works but I needed to upgrade pandas version 0.20.x to 0.23.4!

Upvotes: 3

Views: 126

Answers (1)

rahlf23
rahlf23

Reputation: 9019

You can use .join() and .rename():

position.join(data, on=['Date','Symbol']).rename(columns={'Open': 'Open_price'})

Here is a full example:

data = pd.DataFrame([['08-02-2018', 'NDA', 0.123], ['08-02-2018','DFA', 0.234],
                    ['08-03-2018', 'NFX', 0.451], ['08-04-2018', 'BBA', 0.453]],
                   columns=['Date', 'Symbol', 'Open']).set_index(['Date', 'Symbol'])

position = pd.DataFrame([['08-02-2018', 'NDA'],['08-03-2018', 'NFX'],
                    ['08-04-2018', 'TYA'],['08-04-2018', 'BBA']],
                   columns=['Date', 'Symbol']).set_index(['Date'])

data.index = data.index.set_levels([pd.to_datetime(data.index.levels[0]), data.index.levels[1]])
position.index = pd.to_datetime(position.index)

position = position.join(data, on=['Date','Symbol']).rename(columns={'Open': 'Open_price'})

Which gives:

data

                    Open
Date       Symbol       
2018-08-02 NDA     0.123
           DFA     0.234
2018-08-03 NFX     0.451
2018-08-04 BBA     0.453

position

           Symbol
Date             
2018-08-02    NDA
2018-08-03    NFX
2018-08-04    TYA
2018-08-04    BBA

Using the above .join() and printing position gives:

           Symbol   Open
Date                    
2018-08-02    NDA  0.123
2018-08-03    NFX  0.451
2018-08-04    TYA    NaN
2018-08-04    BBA  0.453

Upvotes: 0

Related Questions