Reputation: 415
I would like to sort a pandas dataframe by the rows which have the most even distribution but also high values. For example:
Row Attribute1 Attribute2 Attribute3 a 1 1 108 b 10 2 145 c 50 60 55 d 100 90 120 e 20 25 23 f 1000 30 0
Rows d and c should rank the highest, ideally d followed by c.
I considered using standard deviation to identify the most even distribution and then mean to get the highest average values but I'm unsure as to how I can combine these together.
Upvotes: 0
Views: 155
Reputation: 1560
As the perception of "even distribution" you mention seems to be quite subjective, here is an instuction to implement the coefficient of variation mentionned by @ALollz.
df.std(axis=1) / df.mean(axis=1)
Row 0
a 1.6848130582715446
b 1.535375387727906
c 0.09090909090909091
d 0.14782502241793033
e 0.11102697698927574
f 1.6569547684031352
This metrics is the percentage of the mean represented by the standard deviation. If you have a row mean of 10 and a standard deviation of 1, the ratio will be 10% or 0.1
In this example, the row that could be considered most 'evenly distributed' is the row c: its mean is 55 and its standard deviation is 5. Therefore the ratio is about 9%.
This way, you can have a decent overview of the homogeneity of the distribution.
If you want the ranking, you can apply .sort_values
:
(df.std(axis=1) / df.mean(axis=1)).sort_values()
Row 0
c 0.09090909090909091
e 0.11102697698927574
d 0.14782502241793033
b 1.535375387727906
f 1.6569547684031352
a 1.6848130582715446
My last words would be to not be fooled by our brain's perception: it can be easily tricked by statistics.
Now if you want to improve results of higher values, you can divide this coefficient by the mean: the higher the mean, the lower the coefficient.
(df.std(axis=1) / df.mean(axis=1)**2).sort_values()
Row 0
d 0.0014305647330767452
c 0.001652892561983471
f 0.004826081849717869
e 0.004898248984820989
b 0.029338383204991835
a 0.045949447043769395
And now we obtain the desired ranking : d first, then c, f, e, b and a
Upvotes: 1