PedroA
PedroA

Reputation: 1925

How to get a proper formatted index in a pandas dataframe

Having a dataframe like this:

>>> df = pd.DataFrame({'name': ['foo', 'foo', 'bar', 'bar'],
                   'colx': [1, 2, 3, 4],
                   'coly': [5, 6, 7, 8]})
>>> df.set_index('name', inplace=True)
>>> df
      colx  coly
name            
foo      1     5
foo      2     6
bar      3     7
bar      4     8

how is it possible to get a proper formatted index like:

      colx  coly
name            
foo      1     5
         2     6
bar      3     7
         4     8

so that pandas doesn't complains about duplicated indices.

Upvotes: 3

Views: 142

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210882

One (among many) option would be to add a new index level:

In [49]: df = df.set_index(df.groupby(level=0).cumcount().add(1) \
                             .to_frame('num')['num'],
                           append=True)

In [50]: df
Out[50]:
          colx  coly
name num
foo  1       1     5
     2       2     6
bar  1       3     7
     2       4     8

UPDATE: don't be confused by the way Pandas shows duplicates in the multi-indices:

if we select all values of the name level of the multi-index we will still see the duplicates:

In [51]: df.index.get_level_values(0)
Out[51]: Index(['foo', 'foo', 'bar', 'bar'], dtype='object', name='name')

It's just the way Pandas represents duplicates in the multi-index. We can switch off this display option:

In [53]: pd.options.display.multi_sparse = False

In [54]: df
Out[54]:
          colx  coly
name num
foo  1       1     5
foo  2       2     6
bar  1       3     7
bar  2       4     8

In [55]: pd.options.display.multi_sparse = True

In [56]: df
Out[56]:
          colx  coly
name num
foo  1       1     5
     2       2     6
bar  1       3     7
     2       4     8

PS this option doesn't change index values and it affects the representaion only for multi-indices

Upvotes: 2

Related Questions