Agnes Lee
Agnes Lee

Reputation: 321

Pandas: Creating a new column using a definied function

Assume I have a df:

df= pd.DataFrame({'alligator_apple': range(1, 11),
                 'barbadine': range(11, 21),
                 'capulin_cherry': range(21, 31)})

And a defined function to get percentile:

def get_percentile(df, percentile):
    n = len(df)
    p = n * percentile / 100
    if p.is_integer():
        return sorted(data)[int(p)]
    else:
        return sorted(data)[int(math.ceil(p)) - 1]

I'm looking to create a new dataframe that displays the 25th to 50th percentile with 5 steps increment of every column in df

My desired outcome looks like this:

percentile   alligator_apple   barbadine  capulin_cherry
        25                 3          13              23
        30                 4          14              24
        35                 4          14              24
        40                 5          15              25
        45                 5          15              25
        50                 6          16              26

I suppose I can loop through the rows & insert values of each percentile with the defined function, but is there a neater way to do this?

Upvotes: 0

Views: 41

Answers (1)

not_speshal
not_speshal

Reputation: 23146

A slight tweak of your get_percentile function with pandas.concat and lambda might give you want you need:

def get_percentile(srs, percentile):
    n = len(srs)
    p = int(n * percentile / 100)
    return srs.sort_values().iat[p]

>>> pd.concat([df.apply(lambda x: get_percentile(x, p)).rename(p) for p in range(25,51,5)], axis=1).transpose()
    alligator_apple  barbadine  capulin_cherry
25                3         13              23
30                4         14              24
35                4         14              24
40                5         15              25
45                5         15              25
50                6         16              26

Alternatively, use pandas.quantile:

>>> pd.concat([df.quantile(p/100, interpolation='lower').rename(p) for p in range(25,51,5)], axis=1).transpose()
    alligator_apple  barbadine  capulin_cherry
25                3         13              23
30                3         13              23
35                4         14              24
40                4         14              24
45                5         15              25
50                5         15              25

Upvotes: 1

Related Questions