Andor
Andor

Reputation: 5933

How to get multiple labels in Pandas 1.1.2?

I have a script that worked fine as long as I did not upgraded Pandas (0.24 --> 1.1.2). I have to select a list of dates from a dataframe, which has a multiindex that includes dates and products and the columns are the metrics.

So far i have done this and it worked liked a charm: df.loc[selected_dates]
Where selected_dates is the list of dates. Some dates may be missing, and it's okay, because nan values will be dropped.

Now I have to sacrifice a goat figuratively to achieve the same with the method suggested by the guide (using reindexing) and it is still prone to errors.

How can I simply get these rows again with the new API?

Here's a sample data frame:

import pandas as pd
from random import randint

selected_dates = ['2020-09-02', '2020-09-05', '2020-09-10']

idx = pd.MultiIndex.from_product([pd.date_range('2020-09-01', '2020-09-07'), ['prod1', 'prod2', 'prod3']])
metric1 = [i for i in range(len(idx))]
metric2 = [10*i for i in range(len(idx))]
df = pd.DataFrame(dict(metric1=metric1, metric2=metric2), index=idx)

df.loc[selected_dates] # KeyError

The df data frame will look like this:

                  metric1  metric2
2020-09-01 prod1        0        0
           prod2        1       10
           prod3        2       20
2020-09-02 prod1        3       30
           prod2        4       40
           prod3        5       50
...

Please think in big! These frame are quite large, and selection is done many times. I have already looked at this SO answer, but it's rather relevant for single indices, not multiindex.

Any hints are appreciated!

Upvotes: 1

Views: 235

Answers (1)

jezrael
jezrael

Reputation: 862681

Solution with DataFrame.loc use converted values to datetimes for match, then use IndexSlice with first : for match all values in second level of MultiIndex and second : for match all columns:

idx = pd.IndexSlice

selected_dates = pd.to_datetime(selected_dates)
df = df.loc[idx[selected_dates,:], :] 
print (df)
                  metric1  metric2
2020-09-02 prod1        3       30
           prod2        4       40
           prod3        5       50
2020-09-05 prod1       12      120
           prod2       13      130
           prod3       14      140

Or use Index.get_level_values with Index.isin in boolean indexing:

selected_dates = pd.to_datetime(selected_dates)
df = df[df.index.get_level_values(0).isin(selected_dates)]
print (df)
                  metric1  metric2
2020-09-02 prod1        3       30
           prod2        4       40
           prod3        5       50
2020-09-05 prod1       12      120
           prod2       13      130
           prod3       14      140

Upvotes: 1

Related Questions