Reputation: 13
I try to delete columns with duplicate data in pandas, for example, the following data(They have the same data but different column names):
df1 = pd.DataFrame({'one': [1, 2, 3, 4], 'two': ['a', 'b', 'c', 'd'], 'three': [1, 2, 3, 4]})
one two three
0 1 a 1
1 2 b 2
2 3 c 3
3 4 d 4
I hope to get this result:
one two
0 1 a
1 2 b
2 3 c
3 4 d
The method I use now is:
df2 = df1.T.drop_duplicates().T
But this is too inefficient, is there a better way?
Hope to get your help, thanks
Upvotes: 1
Views: 107
Reputation: 34046
I tried to improve a little efficiency like this:
In [935]: df_int = df1.select_dtypes(include=['int'])
In [933]: df_other = df1.select_dtypes(exclude=['int'])
In [949]: if df_int.T.drop_duplicates().shape[0] == 1:
...: res = pd.concat([df_int.iloc[:,0], df_other], axis=1)
...:
In [950]: res
Out[950]:
one two
0 1 a
1 2 b
2 3 c
3 4 d
To remove transpose
completely, you can do something like this:
In [995]: import numpy as np
In [997]: if (pd.DataFrame(np.diff(df_int.values)).sum() == 0).all():
...: res = pd.concat([df_int.iloc[:,0], df_other], axis=1)
Upvotes: 1