Reputation: 1647

Function that calculates the 80th percentile for a pandas dataframe

I'm working with a pandas DataFrame similar to the one below.

   School  students
0   A       44
1   B       38
2   C       33
3   D       29
4   E       28
5   F       25
6   G       23

I've created a function that's intended to iterate through each row and accumulate the number of students across school until the sum is greater or equal to 75% of all students. Then the function should return the index of the dataframe. (The column is already sorted.) My code below isn't working. Can you tell me what is wrong? The error message is below it.

percentile = .75

def get_top(df,perc=percentile):
    thresh = perc*df['students'].sum()
    cum = 0
    for index, row in df.iterrows() :
        cum = cum + row['students']
        if cum >= thresh:
            return index-1
            break

output = df.apply(get_top)

KeyError: ('students', u'occurred at index School')

Upvotes: 0

Answers (2)

James Eaves

Reputation: 1647

As Jarad indicated in the comment section, I needed to change the function call to:

output = get_top(df,perc=percentile)

Upvotes: 0

jezrael

Reputation: 862671

You can use numpy.where with cumsum:

print (0.75*df['students'].sum())
165.0

print (df.students.cumsum())
0     44
1     82
2    115
3    144
4    172
5    197
6    220
Name: students, dtype: int64

df['out'] = np.where(df.students.cumsum() >= 0.75*df['students'].sum(), 
                    df.index, 
                    df.students.cumsum())
print (df)
  School  students  out
0      A        44   44
1      B        38   82
2      C        33  115
3      D        29  144
4      E        28    4
5      F        25    5
6      G        23    6

Or if you want use percentile - function quantile:

print (df.students.quantile(.75))
35.5

df['out'] = np.where(df.students >= df.students.quantile(.75), 
                    df.students.cumsum(), 
                    df.index)
print (df)
  School  students  out
0      A        44   44
1      B        38   82
2      C        33    2
3      D        29    3
4      E        28    4
5      F        25    5
6      G        23    6

Upvotes: 1

Function that calculates the 80th percentile for a pandas dataframe

Answers (2)

Related Questions