Nyxynyx
Nyxynyx

Reputation: 63619

Selecting the 2nd MultiIndex Level of Pandas DataFrame as an Indexer

I have a pandas DataFrame with multiindex where I want to select all rows between 11am and 1pm.

import pandas as pd

data = [
    ('Jack', '2020-01-01 10:00:00', 12),
    ('Jack', '2020-01-01 11:00:00', 13),
    ('Jack', '2020-01-01 12:00:00', 14),
    ('Jack', '2020-01-01 13:00:00', 15),
    ('Jack', '2020-01-01 14:00:00', 16),
    ('Ryan', '2020-01-01 10:00:00', 34),
    ('Ryan', '2020-01-01 11:00:00', 35),
    ('Ryan', '2020-01-01 12:00:00', 36),
    ('Ryan', '2020-01-01 13:00:00', 37),
    ('Ryan', '2020-01-01 14:00:00', 38),
]
df = pd.DataFrame(data, columns=['name', 'datetime', 'score']).set_index(['name','datetime'])
#                           score
# name datetime                  
# Jack 2020-01-01 10:00:00     12
#      2020-01-01 11:00:00     13
#      2020-01-01 12:00:00     14
#      2020-01-01 13:00:00     15
#      2020-01-01 14:00:00     16
# Ryan 2020-01-01 10:00:00     34
#      2020-01-01 11:00:00     35
#      2020-01-01 12:00:00     36
#      2020-01-01 13:00:00     37
#      2020-01-01 14:00:00     38

My current solution requires converting all the multiindex to regular columns, converting the datetime column to an indexer which is then used to select the desired rows. The multiindex is then rebuilt.

df = df.reset_index()
indexer = pd.DatetimeIndex(df['datetime'])
df = df.loc[indexer.indexer_between_time('11:00', '13:00')].set_index(['name', 'datetime'])
#                           score
# name datetime                  
# Jack 2020-01-01 11:00:00     13
#      2020-01-01 12:00:00     14
#      2020-01-01 13:00:00     15
# Ryan 2020-01-01 11:00:00     35
#      2020-01-01 12:00:00     36
#      2020-01-01 13:00:00     37

Question: Is it possible to directly use the 2nd level of the multiindex as the indexer, thus avoiding having to reset_index and set_index?

Or is there an even better method to achieve the filtering of rows between 2 different times?

I am using Python 3.7.4 and pandas 0.25.1. Willing to upgrade to newer versions if they allow better solutions

Upvotes: 1

Views: 63

Answers (2)

Akhilesh_IN
Akhilesh_IN

Reputation: 1327

df.loc[(slice(None),slice('2020-01-01 11:00:00','2020-01-01 13:00:00')),:]

output:

                          score
name datetime                  
Jack 2020-01-01 11:00:00     13
     2020-01-01 12:00:00     14
     2020-01-01 13:00:00     15
Ryan 2020-01-01 11:00:00     35
     2020-01-01 12:00:00     36
     2020-01-01 13:00:00     37

Upvotes: 1

anky
anky

Reputation: 75080

You can use the index directly with get_level_values and pd.IndexSlice:

indexer = (pd.DatetimeIndex(df.index.get_level_values('datetime'))
           .indexer_between_time('11:00', '13:00'))
df.loc[pd.IndexSlice[:, df.index.get_level_values('datetime')[indexer]], :]     

                          score
name datetime                  
Jack 2020-01-01 11:00:00     13
     2020-01-01 12:00:00     14
     2020-01-01 13:00:00     15
Ryan 2020-01-01 11:00:00     35
     2020-01-01 12:00:00     36
     2020-01-01 13:00:00     37

Upvotes: 2

Related Questions