user308827
user308827

Reputation: 21981

Aggregating pandas dataframe into percentile ranks for multiple columns

In the foll. dataframe:

    ps_variable_2  ps_variable_1
0       45.652174      60.000000
1       48.913043      13.333333
2       56.521739      55.555556
3       57.608696      37.777778
4       30.434783      44.444444
5       36.956522      77.777778
6       33.695652      64.444444
7       53.260870      37.777778

I want to get a new dataframe out of it with 3 columns: Top 33% Middle 33% Bottom 33%. Each column has 3 rows as below:

                Top 33%     Middle 33%   Bottom 33%
Top 33%         
Middle 33%          
Bottom 33%          

E.g. the first row contains the number of cases where ps_variable_1 is in the top 33% or top third-percentile, and the respective number of cases where ps_variable_2 is in the top third, middle third and bottom third percentile respectively.

Upvotes: 1

Views: 695

Answers (1)

Allen Qin
Allen Qin

Reputation: 19957

You can use rank to get percentile for each row, cut them to top, mid bottom bins, and then count them.

(
    df.apply(lambda x: pd.cut(x.rank(pct=True),
                              [0,1/3.0,2/3.0,1],
                              labels=["Bottom 33%","Middle 33%","Top 33%"]))
    .apply(lambda y: y.value_counts())
)
Out[549]: 
            ps_variable_2  ps_variable_1
Bottom 33%              2              3
Middle 33%              3              2
Top 33%                 3              3

Upvotes: 1

Related Questions