Reputation: 1925
Having a dataframe like this:
>>> df = pd.DataFrame({'name': ['foo', 'foo', 'bar', 'bar'],
'colx': [1, 2, 3, 4],
'coly': [5, 6, 7, 8]})
>>> df.set_index('name', inplace=True)
>>> df
colx coly
name
foo 1 5
foo 2 6
bar 3 7
bar 4 8
how is it possible to get a proper formatted index like:
colx coly
name
foo 1 5
2 6
bar 3 7
4 8
so that pandas doesn't complains about duplicated indices.
Upvotes: 3
Views: 142
Reputation: 210882
One (among many) option would be to add a new index level:
In [49]: df = df.set_index(df.groupby(level=0).cumcount().add(1) \
.to_frame('num')['num'],
append=True)
In [50]: df
Out[50]:
colx coly
name num
foo 1 1 5
2 2 6
bar 1 3 7
2 4 8
UPDATE: don't be confused by the way Pandas shows duplicates in the multi-indices:
if we select all values of the name
level of the multi-index we will still see the duplicates:
In [51]: df.index.get_level_values(0)
Out[51]: Index(['foo', 'foo', 'bar', 'bar'], dtype='object', name='name')
It's just the way Pandas represents duplicates in the multi-index. We can switch off this display option:
In [53]: pd.options.display.multi_sparse = False
In [54]: df
Out[54]:
colx coly
name num
foo 1 1 5
foo 2 2 6
bar 1 3 7
bar 2 4 8
In [55]: pd.options.display.multi_sparse = True
In [56]: df
Out[56]:
colx coly
name num
foo 1 1 5
2 2 6
bar 1 3 7
2 4 8
PS this option doesn't change index values and it affects the representaion only for multi-indices
Upvotes: 2