star_it8293
star_it8293

Reputation: 439

Fill in dataframe column into separate percentiles

I have a large dataframe that looks something like this:

ID   Fruit       Percentiles
001  Apple          0
002  Pear           0
003  Banana         0
004  Kiwi           0
005  Orange         0
006  Pineapple      0
...
...
039  Peach          0
040  Grapes         0

I want to create 40 different Percentiles (the data frame is already sorted, so I just need a way to fill the "Percentile" column)

The final Dataframe should look like this:

ID   Fruit       Percentile
001  Apple          1
002  Pear           1
003  Banana         2
004  Kiwi           2
005  Orange         3
006  Pineapple      3
...
...
039  Peach          40
040  Grapes         40

I have tried to create a loop which does something like this:

df.Category[0:int(df.size[0]*0.05)] = 1
df.Category[int(df.size[0]*0.05):int(df.size[0]*0.10)+1] = 2
...
...
df.Category[int(df.size[0]*0.95):int(df.size[0])+1] = 20

Upvotes: 0

Views: 544

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35646

pd.cut can be used on a RangeIndex to group into even sized groups:

df['Percentile'] = pd.cut(df.index, bins=20, labels=False) + 1

If the index is not already the default ascending zero based range index, we can use pd.RangeIndex based on the length of the DataFrame to generate one instead:

df['Percentile'] = pd.cut(pd.RangeIndex(len(df)), bins=20, labels=False) + 1

np.arange works similarly:

df['Percentile'] = pd.cut(np.arange(len(df)), bins=20, labels=False) + 1

Some sample data:

import numpy as np
import pandas as pd

n = 40
df = pd.DataFrame({
    'ID': [f'{i:03d}' for i in range(1, n + 1)],
    'Fruit': np.random.choice(['Apple', 'Pear', 'Banana', 'Kiwi', 'Orange'], n)
})

Upvotes: 1

Related Questions