Creating slices of dataframe groupby groups

Question

I have a Dataframe with 3 columns - location_id, customers, cluster. Previously, I clustered by data into 5 clusters. Hence, the cluster column contain values [0, 1, 2, 3, 4].

I would like to separate each cluster into 2 slices for my next stage of testing. E.g. 50-50 slice, or 30-70 slice, or 20-80 slice.

Question - How do I apply a function that adds a column to data.groupby('cluster')?

Ideal Result

  location_id        customers  cluster  slice
0       149213        132817       1       1
1       578371        76655        1       0
2        91703        74048        2       1 
3       154868        62397        2       1
4      1022759        59162        2       0

Update

@MaxU's solution put me on the right path. The solution involves using dataframe.assign function to add a new column, and a check for current index/ total index length to assign a slice of the correct proportions. However, the code below somehow did not work for me. I ended up splitting up the @MaxU's solution into separate steps and that worked.

testgroup= (data.groupby('cluster')
.apply(lambda x: x.assign(index1=(np.arange(len(x))))
))
testgroup= (testgroup.groupby('cluster')
.apply(lambda x: x.assign(total_len=len(x))
))

testgroup['is_slice'] = ((testgroup['index1']/testgroup['total_len']) <= 0.5)

            location_id  customers  cluster  index1  total_len  is_slice

    0        149213        132817        1     0     12   True
    1        578371         76655        1     1     12   True
    2         91703         74048        1     2     12   True
    3        154868         62397        1     3     12   True
    4       1022759         59162        1     4     12   True
    5         87016         58134        1     5     12   True
    6        649432         56849        1     6     12   False
    7        219163         56802        1     7     12   False
    8         97704         54718        1     8     12   False
    9        248455         52806        1     9     12   False
    10       184828         52783        1    10     12   False
    11       152887         52565        1    11     12   False

Creating slices of dataframe groupby groups

Answers (1)

Related Questions