Sakura Maozi
Sakura Maozi

Reputation: 13

Remove duplicate columns in pandas

I try to delete columns with duplicate data in pandas, for example, the following data(They have the same data but different column names):

df1 = pd.DataFrame({'one': [1, 2, 3, 4], 'two': ['a', 'b', 'c', 'd'], 'three': [1, 2, 3, 4]})
   one two  three
0    1   a      1
1    2   b      2
2    3   c      3
3    4   d      4

I hope to get this result:

  one two
0   1   a
1   2   b
2   3   c
3   4   d

The method I use now is:

df2 = df1.T.drop_duplicates().T

But this is too inefficient, is there a better way?

Hope to get your help, thanks

Upvotes: 1

Views: 107

Answers (1)

Mayank Porwal
Mayank Porwal

Reputation: 34046

I tried to improve a little efficiency like this:

In [935]: df_int = df1.select_dtypes(include=['int'])
In [933]: df_other = df1.select_dtypes(exclude=['int'])

In [949]: if df_int.T.drop_duplicates().shape[0] == 1:
     ...:     res = pd.concat([df_int.iloc[:,0], df_other], axis=1)
     ...: 

In [950]: res
Out[950]: 
   one two
0    1   a
1    2   b
2    3   c
3    4   d

To remove transpose completely, you can do something like this:

In [995]: import numpy as np
In [997]: if (pd.DataFrame(np.diff(df_int.values)).sum() == 0).all():
     ...:     res = pd.concat([df_int.iloc[:,0], df_other], axis=1)

Upvotes: 1

Related Questions