Reputation: 321
Assume I have a df:
df= pd.DataFrame({'alligator_apple': range(1, 11),
'barbadine': range(11, 21),
'capulin_cherry': range(21, 31)})
And a defined function to get percentile:
def get_percentile(df, percentile):
n = len(df)
p = n * percentile / 100
if p.is_integer():
return sorted(data)[int(p)]
else:
return sorted(data)[int(math.ceil(p)) - 1]
I'm looking to create a new dataframe that displays the 25th to 50th percentile with 5 steps increment of every column in df
My desired outcome looks like this:
percentile alligator_apple barbadine capulin_cherry
25 3 13 23
30 4 14 24
35 4 14 24
40 5 15 25
45 5 15 25
50 6 16 26
I suppose I can loop through the rows & insert values of each percentile with the defined function, but is there a neater way to do this?
Upvotes: 0
Views: 41
Reputation: 23146
A slight tweak of your get_percentile
function with pandas.concat
and lambda
might give you want you need:
def get_percentile(srs, percentile):
n = len(srs)
p = int(n * percentile / 100)
return srs.sort_values().iat[p]
>>> pd.concat([df.apply(lambda x: get_percentile(x, p)).rename(p) for p in range(25,51,5)], axis=1).transpose()
alligator_apple barbadine capulin_cherry
25 3 13 23
30 4 14 24
35 4 14 24
40 5 15 25
45 5 15 25
50 6 16 26
Alternatively, use pandas.quantile
:
>>> pd.concat([df.quantile(p/100, interpolation='lower').rename(p) for p in range(25,51,5)], axis=1).transpose()
alligator_apple barbadine capulin_cherry
25 3 13 23
30 3 13 23
35 4 14 24
40 4 14 24
45 5 15 25
50 5 15 25
Upvotes: 1