Reputation: 63619
I have a pandas DataFrame with multiindex where I want to select all rows between 11am and 1pm.
import pandas as pd
data = [
('Jack', '2020-01-01 10:00:00', 12),
('Jack', '2020-01-01 11:00:00', 13),
('Jack', '2020-01-01 12:00:00', 14),
('Jack', '2020-01-01 13:00:00', 15),
('Jack', '2020-01-01 14:00:00', 16),
('Ryan', '2020-01-01 10:00:00', 34),
('Ryan', '2020-01-01 11:00:00', 35),
('Ryan', '2020-01-01 12:00:00', 36),
('Ryan', '2020-01-01 13:00:00', 37),
('Ryan', '2020-01-01 14:00:00', 38),
]
df = pd.DataFrame(data, columns=['name', 'datetime', 'score']).set_index(['name','datetime'])
# score
# name datetime
# Jack 2020-01-01 10:00:00 12
# 2020-01-01 11:00:00 13
# 2020-01-01 12:00:00 14
# 2020-01-01 13:00:00 15
# 2020-01-01 14:00:00 16
# Ryan 2020-01-01 10:00:00 34
# 2020-01-01 11:00:00 35
# 2020-01-01 12:00:00 36
# 2020-01-01 13:00:00 37
# 2020-01-01 14:00:00 38
My current solution requires converting all the multiindex to regular columns, converting the datetime
column to an indexer which is then used to select the desired rows. The multiindex is then rebuilt.
df = df.reset_index()
indexer = pd.DatetimeIndex(df['datetime'])
df = df.loc[indexer.indexer_between_time('11:00', '13:00')].set_index(['name', 'datetime'])
# score
# name datetime
# Jack 2020-01-01 11:00:00 13
# 2020-01-01 12:00:00 14
# 2020-01-01 13:00:00 15
# Ryan 2020-01-01 11:00:00 35
# 2020-01-01 12:00:00 36
# 2020-01-01 13:00:00 37
Question: Is it possible to directly use the 2nd level of the multiindex as the indexer, thus avoiding having to reset_index
and set_index
?
Or is there an even better method to achieve the filtering of rows between 2 different times?
I am using Python 3.7.4 and pandas 0.25.1. Willing to upgrade to newer versions if they allow better solutions
Upvotes: 1
Views: 63
Reputation: 1327
df.loc[(slice(None),slice('2020-01-01 11:00:00','2020-01-01 13:00:00')),:]
output:
score
name datetime
Jack 2020-01-01 11:00:00 13
2020-01-01 12:00:00 14
2020-01-01 13:00:00 15
Ryan 2020-01-01 11:00:00 35
2020-01-01 12:00:00 36
2020-01-01 13:00:00 37
Upvotes: 1
Reputation: 75080
You can use the index directly with get_level_values
and pd.IndexSlice
:
indexer = (pd.DatetimeIndex(df.index.get_level_values('datetime'))
.indexer_between_time('11:00', '13:00'))
df.loc[pd.IndexSlice[:, df.index.get_level_values('datetime')[indexer]], :]
score
name datetime
Jack 2020-01-01 11:00:00 13
2020-01-01 12:00:00 14
2020-01-01 13:00:00 15
Ryan 2020-01-01 11:00:00 35
2020-01-01 12:00:00 36
2020-01-01 13:00:00 37
Upvotes: 2