Python Pandas Multi-index: keeping same length of level=1 with all level=0 indexes

Question

I have a df_ver1 with multi-index index. I want to drop all rows that have different level[1] length then 2. Here is my dataframe below.

In [13]: df_ver1
Out[13]: 
key  nm         0         1         2         3
bar one -0.424972  0.567020  0.276232 -1.087401
    two -0.673690  0.113648 -1.478427  0.524988
baz one  0.404705  0.577046 -1.715002 -1.039268
    two -0.370647 -1.157892 -1.344312  0.844885
foo one  1.075770 -0.109050  1.643563 -1.469388
qux one -1.294524  0.413738  0.276662 -0.472035
    two -0.013960 -0.362543 -0.006154 -0.923061
oof two  1.340309 -1.187678 -2.211372  0.380396

and my ideal output is

In [13]: df_ver1_fixed
Out[13]: 
key  nm         0         1         2         3
bar one -0.424972  0.567020  0.276232 -1.087401
    two -0.673690  0.113648 -1.478427  0.524988
baz one  0.404705  0.577046 -1.715002 -1.039268
    two -0.370647 -1.157892 -1.344312  0.844885
qux one -1.294524  0.413738  0.276662 -0.472035
    two -0.013960 -0.362543 -0.006154 -0.923061

So as you can see I want to drop row with only 1 level[1] index. In other words, I need to have 'one' and 'two' indexes in second level. Is there a pythonic way to do this step? Thanks!

Tai · Accepted Answer

This would also work. You can actually group by multi-index key and filter out the length of the groups not equal to 2.

df.groupby(by='key').filter(lambda x: len(x) == 2) # keep groups with len 2

As @Zero suggested, we can be more specific using the following to specify the set of variables met the requirement, set(['one', 'two']).

df.groupby(by='key').filter(
              lambda x: set(x.index.get_level_values('nm')) == set(['one', 'two']))

key  nm         0         1         2         3
bar one -0.424972  0.567020  0.276232 -1.087401
    two -0.673690  0.113648 -1.478427  0.524988
baz one  0.404705  0.577046 -1.715002 -1.039268
    two -0.370647 -1.157892 -1.344312  0.844885
qux one -1.294524  0.413738  0.276662 -0.472035
    two -0.013960 -0.362543 -0.006154 -0.923061

Another approach: use multi-index selection

sz = df.groupby("key").size()
indexes = sz[sz == 2].index.tolist()  # first-level indexes that we want.
df.loc[indexes] # use loc for selection

key  nm         0         1         2         3
bar one -0.424972  0.567020  0.276232 -1.087401
    two -0.673690  0.113648 -1.478427  0.524988
baz one  0.404705  0.577046 -1.715002 -1.039268
    two -0.370647 -1.157892 -1.344312  0.844885
qux one -1.294524  0.413738  0.276662 -0.472035
    two -0.013960 -0.362543 -0.006154 -0.923061

Python Pandas Multi-index: keeping same length of level=1 with all level=0 indexes

Answers (2)

Related Questions