Use column with duplicated values as data frame index in Pandas

Question

I would like to set index for a data frame using a column with duplicated values. Is there any way that Pandas can automatically add a second index so that when the first index is duplicated then the second index will be increased?

For example:

   ID              name  company           position
   ------------------------------------------------
0  23      Alex Monoson   Coobit      Sales manager
1  12    Johnny Johnson   Coobit  Marketing manager
2  62         Hans Dupa    Pesik  Marketing manager
3  31    Jessica Heiler  Montino           Engineer
4  92  Dominic Alvorine  Montino                CFO
5  16           Hei Lee   Coobit                CEO

I would like to use company as index and there will be another integer index column

My expected output:

                    ID    name    position
company
------------------------------------------
Coobit      0       blah  blah        blah
Coobit      1       blah  blah        blah
Coobit      2       blah  blah        blah
Pesik       0       blah  blah        blah
Montino     0       blah  blah        blah
Montino     1       blah  blah        blah

BENY · Accepted Answer

We can use cumcount

df['index2']=df.groupby('company').cumcount()
df=df.set_index(['company','index2']).sort_index()

Use column with duplicated values as data frame index in Pandas

Answers (1)

Related Questions