Reputation: 861
Here is my multi-index dataframe:
# Index Levels
outside = ['G1','G1','G1','G2','G2','G2']
inside = [1,2,3,1,2,3]
hier_index = list(zip(outside,inside))
hier_index = pd.MultiIndex.from_tuples(hier_index)
df = pd.DataFrame(np.random.randn(6,2),index=hier_index,columns=['A','B'])
df.index.names = ['Group','Num']
df
The dataframe looks like this:
A B
Group Num
G1 1 0.147027 -0.479448
2 0.558769 1.024810
3 -0.925874 1.862864
G2 1 -1.133817 0.610478
2 0.386030 2.084019
3 -0.376519 0.230336
What I want to achieve is to return the values in Group
G1
and G2
, Num
1
and 3
, which looks like this:
G1 1 0.147027 -0.479448
3 -0.925874 1.862864
G2 1 -1.133817 0.610478
3 -0.376519 0.230336
I've tried
df.loc[['G1','G2']].loc[[1,3]]
but it shows nothing.
Then I tried
df.xs([['G1','G2'],[1,3]])
but it returns
TypeError: '(['G1', 'G2'], [1, 3])' is an invalid key.
Is there any way I can just make it return the values in Group
G1
and G2
, Num
1
and 3
?
Upvotes: 5
Views: 1216
Reputation: 25997
As an alternative to .loc
, you can also use query
like this:
df.query('Group in ["G1", "G2"] and Num in [1, 3]')
which also returns:
A B
Group Num
G1 1 -1.749477 -0.276759
3 0.888542 -0.656236
G2 1 0.757631 -1.091000
3 -1.203232 1.702107
If you like you can also use parameters in the queries using @
which might come in handy if you have long lists:
num_sel = [1, 3]
df.query('Group in ["G1", "G2"] and Num in @num_sel')
yielding the same output.
You can then also easily add additional constraints on the column entries, e.g. that the values in A
have to be greater than 0
:
df.query('Group in ["G1", "G2"] and Num in [1, 3] and A > 0')
which returns
A B
Group Num
G1 3 0.888542 -0.656236
G2 1 0.757631 -1.091000
Upvotes: 5
Reputation: 12038
The .loc
indexing method of pandas
takes the level of each index as an argument:
df.loc[['G1','G2'], [1,3],:]
Being that your dataframe has 2 indices and 1 level of columns, the .loc
method will take 3 arguments, one for each level of the index and one for the columns.
You can read more with the documentation.
Upvotes: 1
Reputation: 862691
Use DataFrame.loc
with lists:
df1 = df.loc[(['G1','G2'], [1,3]), :]
print (df1)
A B
Group Num
G1 1 2.165594 0.466762
3 0.451996 0.125071
G2 1 2.783947 0.176145
3 0.169508 0.071441
Or use slicers:
idx = pd.IndexSlice
df1 = df.loc[idx[['G1','G2'], [1,3]], :]
print (df1)
A B
Group Num
G1 1 0.617367 -1.010116
3 -0.990257 -1.262942
G2 1 1.336134 -0.198787
3 -0.310426 1.063520
Upvotes: 5