Reputation: 77
I have a dataframe that looks like this:
- | date_x | date_y | date_z |
---|---|---|---|
0 | 2021-02-19 12:26:48+00:00 | 2021-02-19 16:15:21.510000+00:00 | NaT |
1 | 2021-02-19 16:16:54+00:00 | 2021-02-10 17:44:01+00:00 | 2021-02-11 10:10:01+00:00 |
1 | 2021-02-21 16:17:54+00:00 | 2021-02-20 17:44:01+00:00 | 2021-02-15 10:10:01+00:00 |
and I want to calculate for each row the order of each column by row:
- | date_x | date_y | date_z |
---|---|---|---|
0 | 0 | 1 | NaN |
1 | 2 | 0 | 1 |
1 | 2 | 1 | 0 |
To then get the average order of each column for the whole dataframe:
- | date_x | date_y | date_z |
---|---|---|---|
0 | 1,33333333333333 | 0,666666666666667 | 0.5 |
I tried using pandas Series.argsort()
but it's the same as Series.sort_values()
but instead returning indexes not values.
I then thought of iterating each row and looking up the index of the list provided by sort_values but I'm sure there MUST be a faster and cleaner solution?
Thanks
Upvotes: 0
Views: 50
Reputation: 863166
You can use DataFrame.rank
, subtract 1, get mean
and convert Series
to one row DataFrame
by Series.to_frame
and transpose:
df = df.rank(axis=1).sub(1).mean().to_frame().T
print (df)
date_x date_y date_z
0 1.333333 0.666667 0.5
Upvotes: 1