Separate DataFrame into N (almost) equal segments

Question

Say I have a data frame that looks like this:

I would like my code to create a new column at the end of the DataFrame that divides the total # of obvservations by 5 ranging from 5 to 1.

Id  ColA    Segment
1   2        5  
2   2        5
3   3        4
4   5        4
5   10       3
6   12       3
7   18       2
8   20       2
9   25       1
10  26       1

I tried the following code but doesn't work:

df['segment'] = pd.qcut(df['Id'],5)

I also want to know what would happpen if the total of my observations was not dividable by 5.

cs95 · Accepted Answer

Actually, you were closer to the answer than you think. This will work regardless of whether len(df) is a multiple of 5 or not.

bins = 5
df['Segment'] = bins - pd.qcut(df['Id'], bins).cat.codes

df
   Id  ColA  Segment
0   1     2        5
1   2     2        5
2   3     3        4
3   4     5        4
4   5    10        3
5   6    12        3
6   7    18        2
7   8    20        2
8   9    25        1
9  10    26        1

Where,

pd.qcut(df['Id'], bins).cat.codes

0    0
1    0
2    1
3    2
4    3
5    4
6    4
dtype: int8

Represents the categorical intervals returned by pd.qcut as integer values.

Another example, for a DataFrame with 7 rows.

df = df.head(7).copy()
df['Segment'] = bins - pd.qcut(df['Id'], bins).cat.codes

df

   Id  ColA  Segment
0   1     2        5
1   2     2        5
2   3     3        4
3   4     5        3
4   5    10        2
5   6    12        1
6   7    18        1

Separate DataFrame into N (almost) equal segments

Answers (2)

Related Questions