Joe Eggleston
Joe Eggleston

Reputation: 11

pandas get integer indices where multiindex changes

I have a very large dataframe with a multiindex. I need to pass one column to C to do an operation quickly. For this operation, I need to know where the multiindex changes values. Since this is a large dataframe, I don't want to iterate over the rows or index within python. A small example:

import numpy as np
import pandas as pd
a = np.array([['bar', 'one', 0, 0],
       ['bar', 'two', 1, 2],
       ['bar', 'one', 2, 4],
       ['bar', 'two', 3, 6],
       ['foo', 'one', 4, 8],
       ['foo', 'two', 5, 10],
       ['bar', 'one', 6, 12],
       ['bar', 'two', 7, 14]], dtype=object)
df = pd.DataFrame(a, columns=['ix0', 'ix1', 'cd0', 'cd1'])
df.sort_values(['ix0', 'ix1'], inplace=True)
df.set_index(['ix0', 'ix1'], inplace=True)

The dataframe looks like this:

In [7]: df
Out[7]: 
        cd0 cd1
ix0 ix1        
bar one   0   0
    one   2   4
    one   6  12
    two   1   2
    two   3   6
    two   7  14
foo one   4   8
    two   5  10

Now I want an array or list that shows where the values in the multiindex change. I.e., the integer index where (bar, one) changes to (bar, two), (bar, two) changes to (foo, one), etc.

To be able to build the hierarchical output, it seems that this data must exist in the index. Is there a way to get to it?

The example output I'm looking for would be: [0, 3, 6, 7].

Thanks

Upvotes: 1

Views: 300

Answers (1)

unutbu
unutbu

Reputation: 879661

You could use np.unique with return_index=True:

In [69]: uniques, indices = np.unique(df.index, return_index=True)

In [70]: indices
Out[70]: array([0, 3, 6, 7])

Upvotes: 1

Related Questions