danielhadar
danielhadar

Reputation: 2161

pandas get_level_values for multiple columns

Is there a way to get the result of get_level_values for more than one column?

Given the following DataFrame:

         d
a b c     
1 4 10  16
    11  17
  5 12  18
2 5 13  19
  6 14  20
3 7 15  21

I wish to get the values (i.e. list of tuples) of levels a and c:

[(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]

Notes:

For example:

a_list = df.index.get_level_values('a').values
c_list = df.index.get_level_values('c').values

print([i for i in zip(a_list,c_list)])
[(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]

but it get cumbersome as the number of columns grow.

df = pd.DataFrame({'a':[1,1,1,2,2,3],'b':[4,4,5,5,6,7,],'c':[10,11,12,13,14,15], 'd':[16,17,18,19,20,21]}).set_index(['a','b','c'])

Upvotes: 23

Views: 25528

Answers (4)

Ynjxsjmh
Ynjxsjmh

Reputation: 30032

You can convert the multi index to dataframe, then select the desired columns with labels or with location, then convert the sub dataframe to list of tuples with to_records.

out = df.index.to_frame()[['a', 'c']].to_records(index=False)
# or
out = df.index.to_frame().iloc[:, [0, 2]].to_records(index=False)

Upvotes: 0

Wildhammer
Wildhammer

Reputation: 2175

Simply:

df.index.to_frame().to_numpy()[:, [0,2]]

Upvotes: 0

Alicia Garcia-Raboso
Alicia Garcia-Raboso

Reputation: 13913

The .tolist() method of a MultiIndex gives a list of tuples for all the levels in the MultiIndex. For example, with your example DataFrame,

df.index.tolist()
# => [(1, 4, 10), (1, 4, 11), (1, 5, 12), (2, 5, 13), (2, 6, 14), (3, 7, 15)]

So here are two ideas:

  1. Get the list of tuples from the original MultiIndex and filter the result.

    [(a, c) for a, b, c in df.index.tolist()]
    # => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
    

    The disadvantage of this simple method is that you have you manually specify the order of the levels you want. You can leverage itertools.compress to select them by name instead.

    from itertools import compress
    
    mask = [1 if name in ['a', 'c'] else 0 for name in df.index.names]
    [tuple(compress(t, mask)) for t in df.index.tolist()]
    # => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
    
  2. Create a MultiIndex that has exactly the levels you want and call .tolist() on it.

    df.index.droplevel('b').tolist()
    # => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
    

    If you would prefer to name the levels you want to keep — instead of those that you want to drop — you could do something like

    df.index.droplevel([level for level in df.index.names
                    if not level in ['a', 'c']]).tolist()
    # => [(1, 10), (1, 11), (1, 12), (2, 13), (2, 14), (3, 15)]
    

Upvotes: 15

IanS
IanS

Reputation: 16251

This is less cumbersome insofar as you can pass the list of index names you want to select:

df.reset_index()[['a', 'c']].to_dict(orient='split')['data']

I have not found a way of selecting levels 'a' and 'b' from the index object directly, hence the use of reset_index.

Note that to_dict returns a list of lists and not tuples:

[[1, 10], [1, 11], [1, 12], [2, 13], [2, 14], [3, 15]]

Upvotes: 1

Related Questions