Reputation: 21981
In the foll. dataframe:
ps_variable_2 ps_variable_1
0 45.652174 60.000000
1 48.913043 13.333333
2 56.521739 55.555556
3 57.608696 37.777778
4 30.434783 44.444444
5 36.956522 77.777778
6 33.695652 64.444444
7 53.260870 37.777778
I want to get a new dataframe out of it with 3 columns: Top 33% Middle 33% Bottom 33%. Each column has 3 rows as below:
Top 33% Middle 33% Bottom 33%
Top 33%
Middle 33%
Bottom 33%
E.g. the first row contains the number of cases where ps_variable_1 is in the top 33% or top third-percentile, and the respective number of cases where ps_variable_2 is in the top third, middle third and bottom third percentile respectively.
Upvotes: 1
Views: 695
Reputation: 19957
You can use rank to get percentile for each row, cut them to top, mid bottom bins, and then count them.
(
df.apply(lambda x: pd.cut(x.rank(pct=True),
[0,1/3.0,2/3.0,1],
labels=["Bottom 33%","Middle 33%","Top 33%"]))
.apply(lambda y: y.value_counts())
)
Out[549]:
ps_variable_2 ps_variable_1
Bottom 33% 2 3
Middle 33% 3 2
Top 33% 3 3
Upvotes: 1