Roger Steinberg
Roger Steinberg

Reputation: 1604

Separate DataFrame into N (almost) equal segments

Say I have a data frame that looks like this:

Id  ColA
1   2           
2   2        
3   3        
4   5        
5   10       
6   12       
7   18       
8   20       
9   25       
10  26          

I would like my code to create a new column at the end of the DataFrame that divides the total # of obvservations by 5 ranging from 5 to 1.

Id  ColA    Segment
1   2        5  
2   2        5
3   3        4
4   5        4
5   10       3
6   12       3
7   18       2
8   20       2
9   25       1
10  26       1  

I tried the following code but doesn't work:

df['segment'] = pd.qcut(df['Id'],5)

I also want to know what would happpen if the total of my observations was not dividable by 5.

Upvotes: 2

Views: 764

Answers (2)

Axel Puig
Axel Puig

Reputation: 1334

This should work:

df['segment'] = np.linspace(1, 6, len(df), False, dtype=int)

It creates a list of int between 1 and 5 of the size of your array. If you want from 5 to 1, just add [::-1] at the end of the line.

Upvotes: 2

cs95
cs95

Reputation: 402263

Actually, you were closer to the answer than you think. This will work regardless of whether len(df) is a multiple of 5 or not.

bins = 5
df['Segment'] = bins - pd.qcut(df['Id'], bins).cat.codes

df
   Id  ColA  Segment
0   1     2        5
1   2     2        5
2   3     3        4
3   4     5        4
4   5    10        3
5   6    12        3
6   7    18        2
7   8    20        2
8   9    25        1
9  10    26        1

Where,

pd.qcut(df['Id'], bins).cat.codes

0    0
1    0
2    1
3    2
4    3
5    4
6    4
dtype: int8

Represents the categorical intervals returned by pd.qcut as integer values.


Another example, for a DataFrame with 7 rows.

df = df.head(7).copy()
df['Segment'] = bins - pd.qcut(df['Id'], bins).cat.codes

df

   Id  ColA  Segment
0   1     2        5
1   2     2        5
2   3     3        4
3   4     5        3
4   5    10        2
5   6    12        1
6   7    18        1

Upvotes: 4

Related Questions