Reputation: 429
I would like to have a unique index value, instead of having the same one repeated many times.
Example: I have this dataframe:
test = pd.DataFrame({'id': ['a','a','a','a','b'],
'col_1': [1,2,3,4,5],
'col_2': [6,7,8,9,10]
})
id col_1 col_2
0 a 1 6
1 a 2 7
2 a 3 8
3 a 4 9
4 b 5 10
And what I want to achieve, is to have the id column as an index, and not repeated. I tried this, but as you can see, the index is repeated in every row:
test.set_index('id')
col_1 col_2
id
a 1 6
a 2 7
a 3 8
a 4 9
b 5 10
And what I would like to achieve, is this (the index 'a' for all the group of 4 values, etc):
col_1 col_2
id
a 1 6
2 7
3 8
4 9
b 5 10
Any ideas how to do it? Thanks in advance.
Upvotes: 6
Views: 2847
Reputation: 5774
You can set the id
column as an index. To avoid duplicate index entries, also set the index as the second level of the resulting MultiIndex.
test.set_index(['id', test.index])
# Out:
col_1 col_2
id
a 0 1 6
1 2 7
2 3 8
3 4 9
b 4 5 10
If you really don't want to have the non-duplicate index level, simply set id
as the index. But mind that in this case the displayed format of pandas will include the duplicates:
test.set_index('id')
# Out:
col_1 col_2
id
a 1 6
a 2 7
a 3 8
a 4 9
b 5 10
Also test.set_index('id').index.duplicated().any()
will yield True
, with the typical non-optimal consequences for indices containing duplicates.
Upvotes: 7
Reputation: 862511
If want replace duplicated values to ''
for displaying, but better is duplicated index values, if need later processing:
df = test.set_index('id')
df1 = df.set_index(df.index.where(~df.index.duplicated(), ''))
print (df1)
col_1 col_2
id
a 1 6
2 7
3 8
4 9
b 5 10
Upvotes: 1