Reputation: 439
I have a large dataframe that looks something like this:
ID Fruit Percentiles
001 Apple 0
002 Pear 0
003 Banana 0
004 Kiwi 0
005 Orange 0
006 Pineapple 0
...
...
039 Peach 0
040 Grapes 0
I want to create 40 different Percentiles (the data frame is already sorted, so I just need a way to fill the "Percentile" column)
The final Dataframe should look like this:
ID Fruit Percentile
001 Apple 1
002 Pear 1
003 Banana 2
004 Kiwi 2
005 Orange 3
006 Pineapple 3
...
...
039 Peach 40
040 Grapes 40
I have tried to create a loop which does something like this:
df.Category[0:int(df.size[0]*0.05)] = 1
df.Category[int(df.size[0]*0.05):int(df.size[0]*0.10)+1] = 2
...
...
df.Category[int(df.size[0]*0.95):int(df.size[0])+1] = 20
Upvotes: 0
Views: 544
Reputation: 35646
pd.cut can be used on a RangeIndex to group into even sized groups:
df['Percentile'] = pd.cut(df.index, bins=20, labels=False) + 1
If the index is not already the default ascending zero based range index, we can use pd.RangeIndex based on the length of the DataFrame to generate one instead:
df['Percentile'] = pd.cut(pd.RangeIndex(len(df)), bins=20, labels=False) + 1
np.arange works similarly:
df['Percentile'] = pd.cut(np.arange(len(df)), bins=20, labels=False) + 1
Some sample data:
import numpy as np
import pandas as pd
n = 40
df = pd.DataFrame({
'ID': [f'{i:03d}' for i in range(1, n + 1)],
'Fruit': np.random.choice(['Apple', 'Pear', 'Banana', 'Kiwi', 'Orange'], n)
})
Upvotes: 1