Reputation: 188
I have a Pandas df that was not well-formatted and needed to force the header/column values to be one of the rows of my original df
(which has duplicate values). The problem is that the header now has duplicates, for e.g:
2.0, 2.0, 10.0, 10.0, ..., 10.0, 16.0, 16.0, 16.0, 21.0, 21.0, 21.0, ...
I want to ensure the header/columns values have unique values like so:
2.0, 2.1, 10.0, 10.1, 10.2, 10.3, ... , 10.8, 10.9, 16.0, 16.1, 16.2, ....
and so on.
The new values can exceed X.9
if needed, it shouldn't matter for my purposes if I get X.10, X.11, X.12, ....
and so on.
I tried using df.columns = df.columns.unique()
but then I got an error saying that
"ValueError: Length mismatch: Expected axis has 76 elements, new values have 37 elements".
I have looked at other methods as well like df.duplicates()
and df.drop_duplicates()
but neither of those seems to be able to provide what it is that I am after.
Thanks!
Upvotes: 1
Views: 1744
Reputation: 1
You can use something like this:
l = [10,10,10,18,18,19,20,21,19,20]
fin=[];d={}
for i in l:
if d.get(i):
d[i] = d[i]+0.1
else:
d[i] = 0.1
fin.append(i+d[i])
df.columns = fin
Upvotes: 0
Reputation: 323316
You can using cumcount
s=samepledf.columns.to_series()
samepledf.columns=s.astype(int).astype(str)+'.'+s.groupby(s).cumcount().astype(str)
samepledf
Out[199]:
2.0 2.1 10.0 10.1
0 1 1 1 1
1 1 1 1 1
2 1 1 1 1
3 1 1 1 1
Data Sample
samepledf=pd.DataFrame(data=[[1,1,1,1],[1,1,1,1],[1,1,1,1],[1,1,1,1]],columns=[2.0, 2.0, 10.0, 10.0])
samepledf
Out[192]:
2.0 2.0 10.0 10.0
0 1 1 1 1
1 1 1 1 1
2 1 1 1 1
3 1 1 1 1
Upvotes: 3
Reputation: 141
Assigning to columns is only for renaming. If you want to just subset the columns and you know you can take either of the duplicately named columns, just do:
df = df[df.columns.unique()]
Upvotes: 0