Reputation: 6154
Essentially, I have a multi-index that is first date-time, then integer value of the week number.
I want to select a week number, check some conditions of the data for that week, and if those conditions are satisfied, get the date-time level of the index.
Here's a simplified version:
# Dataframe creation
index = pd.date_range('1/1/2019', '1/4/2019')
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B':[3, 4, 5, 6], 'C':['a', 'a', 'b', 'b']}, index=index)
print(df)
A B C
2019-01-01 1 3 a
2019-01-02 2 4 a
2019-01-03 3 5 b
2019-01-04 4 6 b
Then,
# make a multi-index with last column
df = df.set_index([df.index, 'C'])
print(df)
A B
C
2019-01-01 a 1 3
2019-01-02 a 2 4
2019-01-03 b 3 5
2019-01-04 b 4 6
Again, fine. According to this answer, I would expect that df.loc[['a']]
would yield
A B
C
2019-01-01 a 1 3
2019-01-02 a 2 4
However, I get a key/index error. I instead tried df[df['C'] == 'a']
, but also with a resulting key/index error.
So the psuedo-code for what I want to achieve is something like:
df = df.set_index([df.index, 'C'])
for value in 'C' level:
check some condition on 'B' variable.
if condition:
get date level for this value
This seems like it would be fairly simple if I could figure out how to slice by specified index level... What am I missing?
Note:
I realize that I could just leave the C
variable as a column and then do something like:
for c in df.C.unique():
if (df[df.C == c].B >= 4).any():
dates = df[df.C == c].index
print(dates)
However, now I'm fixated on understanding multi-indexes just to learn Pandas better.
Upvotes: 1
Views: 536
Reputation: 150745
Quick fix: query
works with level names:
df.query('C=="a"')
Output:
A B
C
2019-01-01 a 1 3
2019-01-02 a 2 4
A bit more details on multilevel indexing from the official doc. If you have multilevel index, you slice it by tuples (first_level, second_level)
:
df.loc[('2019/01/01', 'a')]
So you can get all the rows with a second-level value:
df.loc[(slice(None), 'a'), :]
and you get:
A B
C
2019-01-01 a 1 3
2019-01-02 a 2 4
or you can use xs
:
df.xs('a', level='C')
which give (note that the second level index is gone):
A B
2019-01-01 1 3
2019-01-02 2 4
Finally, what you are trying to do for c in df.C.unique():
is more of groupby
:
df.loc[df.groupby('C')['B'].transform('max')>=4].index
Output:
MultiIndex([('2019-01-01', 'a'),
('2019-01-02', 'a'),
('2019-01-03', 'b'),
('2019-01-04', 'b')],
names=[None, 'C'])
Upvotes: 2
Reputation: 7852
I think what you want is:
df.loc[(slice(None), 'a'), :]
which will give you:
A B
C
2019-01-01 a 1 3
2019-01-02 a 2 4
This is based on Question 1b of this answer to the question Select rows in pandas MultiIndex DataFrame.
Upvotes: 0