Reputation: 7506
Is there any equivalent of pandas.DataFrame.reset_index()
which operates on the columns and can handle the case of duplicate column names? I want it to throw away the column names and return a default numbered index 0,1,2.. for the columns. (Methods like df.rename
or df.reindex_axis
do not work when I have duplicate column names.)
Sample input:
pd.DataFrame(np.random.rand(5, 3), columns = ['A', 'A', 'B'])
A A B
0 0.5 0.3 0.9
1 0.7 0.9 0.3
2 0.9 0.4 0.8
3 0.6 0.2 0.9
4 0.7 0.4 0.6
Expected output:
0 1 2
0 0.8 0.1 0.2
1 0.4 0.2 0.4
2 0.3 0.3 0.4
3 0.4 0.1 0.8
4 1.0 0.9 0.9
Upvotes: 9
Views: 5503
Reputation: 210942
you can use set_axis() method:
In [54]: df
Out[54]:
A A B
0 0.934900 0.817182 0.166270
1 0.064543 0.139431 0.249576
2 0.709349 0.731913 0.965048
3 0.284955 0.479898 0.496652
4 0.520749 0.464256 0.999993
In [55]: df.set_axis(1, range(len(df.columns)))
In [56]: df
Out[56]:
0 1 2
0 0.934900 0.817182 0.166270
1 0.064543 0.139431 0.249576
2 0.709349 0.731913 0.965048
3 0.284955 0.479898 0.496652
4 0.520749 0.464256 0.999993
Upvotes: 5
Reputation: 863341
Use range
with length of columns by shape
:
df.columns = range(df.shape[1])
print (df)
0 1 2
0 0.228080 0.884450 0.753401
1 0.176790 0.741979 0.525305
2 0.680255 0.730258 0.449681
3 0.169420 0.660825 0.986554
4 0.302204 0.040413 0.902899
Another solution with double transposing by T
and reset_index
with parameter drop=True
:
df = df.T.reset_index(drop=True).T
print (df)
0 1 2
0 0.024846 0.688193 0.887926
1 0.284681 0.895319 0.142876
2 0.440834 0.299527 0.762815
3 0.936967 0.928907 0.642960
4 0.801077 0.085773 0.866651
Upvotes: 5