Javier
Javier

Reputation: 188

Pandas How to ensure unique header/column values?

I have a Pandas df that was not well-formatted and needed to force the header/column values to be one of the rows of my original df (which has duplicate values). The problem is that the header now has duplicates, for e.g:

2.0, 2.0, 10.0, 10.0, ..., 10.0, 16.0, 16.0, 16.0, 21.0, 21.0, 21.0, ...

I want to ensure the header/columns values have unique values like so:

2.0, 2.1, 10.0, 10.1, 10.2, 10.3, ... , 10.8, 10.9, 16.0, 16.1, 16.2, .... 

and so on.

The new values can exceed X.9 if needed, it shouldn't matter for my purposes if I get X.10, X.11, X.12, .... and so on.

I tried using df.columns = df.columns.unique() but then I got an error saying that

"ValueError: Length mismatch: Expected axis has 76 elements, new values have 37 elements".

I have looked at other methods as well like df.duplicates() and df.drop_duplicates() but neither of those seems to be able to provide what it is that I am after.

Thanks!

Upvotes: 1

Views: 1744

Answers (3)

EshaBhide
EshaBhide

Reputation: 1

You can use something like this:

l = [10,10,10,18,18,19,20,21,19,20]
fin=[];d={}
for i in l:
    if d.get(i):
        d[i] = d[i]+0.1
    else:
        d[i] = 0.1
    fin.append(i+d[i])
df.columns = fin

Upvotes: 0

BENY
BENY

Reputation: 323316

You can using cumcount

s=samepledf.columns.to_series()
samepledf.columns=s.astype(int).astype(str)+'.'+s.groupby(s).cumcount().astype(str)

samepledf
Out[199]: 
   2.0   2.1   10.0  10.1
0     1     1     1     1
1     1     1     1     1
2     1     1     1     1
3     1     1     1     1

Data Sample

samepledf=pd.DataFrame(data=[[1,1,1,1],[1,1,1,1],[1,1,1,1],[1,1,1,1]],columns=[2.0, 2.0, 10.0, 10.0])
samepledf
Out[192]: 
   2.0   2.0   10.0  10.0
0     1     1     1     1
1     1     1     1     1
2     1     1     1     1
3     1     1     1     1

Upvotes: 3

Mali Akmanalp
Mali Akmanalp

Reputation: 141

Assigning to columns is only for renaming. If you want to just subset the columns and you know you can take either of the duplicately named columns, just do:

df = df[df.columns.unique()] 

Upvotes: 0

Related Questions