Steven B
Steven B

Reputation: 37

Percentile function Python

Is there a convenient way to calculate the percentile of a column based on the values of other columns in a data frame. i.e. the 10th percentile of prices for each type of toy

I have a method by which I can get my answer but it is very long winded and won't work well on larger datasets

import pandas as pd
import numpy as np

data = {'Toy': ['Truck', 'Truck', 'Truck', 'Barbie', 'Snake', 'Barbie'], 
    'Colour': ['Blue', 'Orange', 'Green', 'Pink','Green','Red'], 
    'Price': [4, 6, 8, 5, 9, 4]}
df = pd.DataFrame(data)
df

df2 = df.groupby(['Toy'])['Price'].agg([np.sum,np.mean, lambda x:np.percentile(x,q= 10)]).reset_index()
df2

df_result = pd.merge(df,df2, on= 'Toy', how='left')
df_result

Which outputs - (lambda being the variable of interest) enter image description here

Upvotes: 2

Views: 705

Answers (1)

rafaelc
rafaelc

Reputation: 59274

I'd say you don't need to make this so complicated (creating another df, using merge etc).

You can simply do

res = df.groupby("Toy").Price.apply(np.percentile, 10)

And use the indexes to match results

df = df.set_index("Toy")
df.loc[:, "Percentile"] = res
df.reset_index()

Upvotes: 1

Related Questions