pandas get percentile of value withing

I have a dataframe:

d = [f1  f2  f3 
     1    2   3 
     5    1   2 
     3    3   1 
     2    4   7
     ..  ..  ..]

I want to add, per feature, the percentile of the value for this feature in the row (for subset of features). So for subset = [f1,f2,f3] my dataframe will be

new_d =[f1   f2   f3  f1_per   f2_per   f3_per
         1    2   3    0         0.25     0.5
         5    1   2    1          0       0.25
         3    3   1    0.5        0.5     0
         2    4   5    0.25      0.75     1
         4    5   4    0.75       1       0.75]

What is the best way to do so?

Upvotes: 0

Answers (3)

jezrael

Reputation: 863671

In ouput are 5 rows, in input are 4 rows, so output is different is use DataFrame.rank with all columns and join back to original

df = df.join(df.rank(pct=True).add_prefix('pct'))
print (df)
  f1  f2  f3  pctf1  pctf2  pctf3
0   1   2   3    0.2    0.4    0.6
1   5   1   2    1.0    0.2    0.4
2   3   3   1    0.6    0.6    0.2
3   2   4   7    0.4    0.8    1.0
4   4   5   4    0.8    1.0    0.8

If need rank with percentile by number of rows without 1:

df = df.join(df.rank().sub(1).div(len(df) - 1).add_prefix('pct'))
print (df)
   f1  f2  f3  pctf1  pctf2  pctf3
0   1   2   3   0.00   0.25   0.50
1   5   1   2   1.00   0.00   0.25
2   3   3   1   0.50   0.50   0.00
3   2   4   7   0.25   0.75   1.00
4   4   5   4   0.75   1.00   0.75

Upvotes: 2

Dani Mesejo

Reputation: 61930

Here is another approach, doing explicitly what you want:

res = df.apply(lambda x: np.greater.outer(x.values, x.values).sum(axis=1) / (len(x) - 1))
res.columns = [f'{c}_per' for c in df.columns]
res = df.join(res)
print(res)

Output

   f1  f2  f3  f1_per  f2_per  f3_per
0   1   2   3    0.00    0.25    0.50
1   5   1   2    1.00    0.00    0.25
2   3   3   1    0.50    0.50    0.00
3   2   4   7    0.25    0.75    1.00
4   4   5   4    0.75    1.00    0.75

Upvotes: 0

Serge de Gosson de Varennes

Reputation: 11532

A way to do this is the following:

df['pct_1'] = df.f1.rank(pct=True) 
df['pct_f2'] = df.f2.rank(pct=True)
df['pct_f3'] = df.f3.rank(pct=True)

which gives:

   f1  f2  f3  pct_1  pct_f2  pct_f3
0   1   2   3   0.25    0.50    0.75
1   5   1   2   1.00    0.25    0.50
2   3   3   1   0.75    0.75    0.25
3   2   4   7   0.50    1.00    1.00

Upvotes: 2

pandas get percentile of value withing

Answers (3)

Related Questions