Reputation: 3672
Given a dataframe
A B C
3 1 2
2 1 3
3 2 1
I would like to get a new column with column names in sorted order
A B C new_col
3 1 2 [B,C,A]
2 1 3 [B,A,C]
3 2 1 [C,B,A]
This is my code. It works but is quite slow.
def blist(x):
col_dict = {}
for col in col_list:
col_dict[col] = x[col]
sorted_tuple = sorted(col_dict.items(), key=operator.itemgetter(1))
return [i[0] for i in sorted_tuple]
df['new_col'] = df.apply(blist,axis=1)
I will appreciate a better approach to solve this problem.
Upvotes: 2
Views: 61
Reputation: 210882
Try to use np.argsort()
in conjunction with np.take()
:
In [132]: df['new_col'] = np.take(df.columns, np.argsort(df)).tolist()
In [133]: df
Out[133]:
A B C new_col
0 3 1 2 [B, C, A]
1 2 1 3 [B, A, C]
2 3 2 1 [C, B, A]
Timing for 30.000 rows DF:
In [182]: df = pd.concat([df] * 10**4, ignore_index=True)
In [183]: df.shape
Out[183]: (30000, 3)
In [184]: %timeit df.apply(blist,axis=1)
4.84 s ± 31 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [185]: %timeit np.take(df.columns, np.argsort(df)).tolist()
5.45 ms ± 26.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Ratio:
In [187]: (4.84*1000)/5.45
Out[187]: 888.0733944954128
Upvotes: 3