Sean
Sean

Reputation: 8731

How do I filter multiIndexed pandas DataFrame by a column value

Given a DataFrame of stock prices, I am interested in filtering based on the latest closing price. I am aware of how to do this for a simple DataFrame, but cannot figure out how to do it for a multi-indexed dataframe.

Simple dataframe:

              AAPL     AMZN     GOOG    MSFT
2021-02-08  136.91  3322.94  2092.91  242.47
2021-02-09  136.01  3305.00  2083.51  243.77
2021-02-10  135.39  3286.58  2095.38  242.82
2021-02-11  135.13  3262.13  2095.89  244.49
2021-02-12  135.37  3277.71  2104.11  244.99

Operation: df.loc[:,df.iloc[-1] < 250]

Output:

              AAPL    MSFT
2021-02-08  136.91  242.47
2021-02-09  136.01  243.77
2021-02-10  135.39  242.82
2021-02-11  135.13  244.49
2021-02-12  135.37  244.99

However I cannot figure out how to accomplish this on a DataFrame with a MultiIndex (such as OHLC)

Multiindex DataFrame:

             Close                              High                               Low  ...            Open                              Volume                            
              AAPL     AMZN     GOOG    MSFT    AAPL     AMZN     GOOG    MSFT    AAPL  ...    MSFT    AAPL     AMZN     GOOG    MSFT      AAPL     AMZN     GOOG      MSFT
2021-02-08  136.91  3322.94  2092.91  242.47  136.96  3365.00  2123.55  243.68  134.92  ...  240.81  136.03  3358.50  2105.91  243.15  71297200  3257400  1241900  22211900
2021-02-09  136.01  3305.00  2083.51  243.77  137.88  3338.00  2105.13  244.76  135.85  ...  241.38  136.62  3312.49  2078.54  241.87  76774200  2203500   889900  23565000
2021-02-10  135.39  3286.58  2095.38  242.82  136.99  3317.95  2108.37  245.92  134.40  ...  240.89  136.48  3314.00  2094.21  245.00  73046600  3151600  1135500  22186700
2021-02-11  135.13  3262.13  2095.89  244.49  136.39  3292.00  2102.03  245.15  133.77  ...  242.15  135.90  3292.00  2099.51  244.78  64280000  2301400   945700  15751100
2021-02-12  135.37  3277.71  2104.11  244.99  135.53  3280.25  2108.82  245.30  133.69  ...  242.73  134.35  3250.00  2090.25  243.93  60029300  2329300   855700  16552000

[5 rows x 20 columns]

Filter: df_filter = df.iloc[-1].loc['Close'] < 250

AAPL     True
AMZN    False
GOOG    False
MSFT     True
Name: 2021-02-12 00:00:00, dtype: bool

Operation???:

Maybe something like df.loc[:,filter] but I receive the error:

pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match)

I understand it's a multi-index so I also tried using pd.IndexSlice: df.loc[:,idx[:,df_filter]] but still get:

ValueError: cannot index with a boolean indexer that is not the same length as the index

Desired Output:

             Close            High             Low            Open            Volume          
              AAPL    MSFT    AAPL    MSFT    AAPL    MSFT    AAPL    MSFT      AAPL      MSFT
2021-02-08  136.91  242.47  136.96  243.68  134.92  240.81  136.03  243.15  71297200  22211900
2021-02-09  136.01  243.77  137.88  244.76  135.85  241.38  136.62  241.87  76774200  23565000
2021-02-10  135.39  242.82  136.99  245.92  134.40  240.89  136.48  245.00  73046600  22186700
2021-02-11  135.13  244.49  136.39  245.15  133.77  242.15  135.90  244.78  64280000  15751100
2021-02-12  135.37  244.99  135.53  245.30  133.69  242.73  134.35  243.93  60029300  16552000

Upvotes: 0

Views: 39

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150815

I'm not sure if IndexSlice works with boolean indexing. You can try passing the valid index:

df.loc[:,pd.IndexSlice[:, df_filter.index[df_filter]]]

Upvotes: 1

Related Questions