Iterate by level in Pandas multi-index

Question

Essentially, I have a multi-index that is first date-time, then integer value of the week number.

I want to select a week number, check some conditions of the data for that week, and if those conditions are satisfied, get the date-time level of the index.

Here's a simplified version:

# Dataframe creation
index = pd.date_range('1/1/2019', '1/4/2019')
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B':[3, 4, 5, 6], 'C':['a', 'a', 'b', 'b']}, index=index)

print(df)

            A  B  C
2019-01-01  1  3  a
2019-01-02  2  4  a
2019-01-03  3  5  b
2019-01-04  4  6  b

Then,

# make a multi-index with last column
df = df.set_index([df.index, 'C'])
print(df)

              A  B
           C      
2019-01-01 a  1  3
2019-01-02 a  2  4
2019-01-03 b  3  5
2019-01-04 b  4  6

Again, fine. According to this answer, I would expect that df.loc[['a']] would yield

              A  B
           C      
2019-01-01 a  1  3
2019-01-02 a  2  4

However, I get a key/index error. I instead tried df[df['C'] == 'a'], but also with a resulting key/index error.

So the psuedo-code for what I want to achieve is something like:

df = df.set_index([df.index, 'C'])

for value in 'C' level:
    check some condition on 'B' variable.
    if condition:
       get date level for this value

This seems like it would be fairly simple if I could figure out how to slice by specified index level... What am I missing?

Note:

I realize that I could just leave the C variable as a column and then do something like:

for c in df.C.unique():
    if (df[df.C == c].B >= 4).any():
        dates = df[df.C == c].index
        print(dates)

However, now I'm fixated on understanding multi-indexes just to learn Pandas better.

Quang Hoang · Accepted Answer

Quick fix: query works with level names:

df.query('C=="a"')

Output:

              A  B
           C      
2019-01-01 a  1  3
2019-01-02 a  2  4

A bit more details on multilevel indexing from the official doc. If you have multilevel index, you slice it by tuples (first_level, second_level):

df.loc[('2019/01/01', 'a')]

So you can get all the rows with a second-level value:

df.loc[(slice(None), 'a'), :]

and you get:

              A  B
           C      
2019-01-01 a  1  3
2019-01-02 a  2  4

or you can use xs:

df.xs('a', level='C')

which give (note that the second level index is gone):

            A  B
2019-01-01  1  3
2019-01-02  2  4

Finally, what you are trying to do for c in df.C.unique(): is more of groupby:

df.loc[df.groupby('C')['B'].transform('max')>=4].index

Output:

MultiIndex([('2019-01-01', 'a'),
            ('2019-01-02', 'a'),
            ('2019-01-03', 'b'),
            ('2019-01-04', 'b')],
           names=[None, 'C'])

Iterate by level in Pandas multi-index

Answers (2)

Related Questions