Pandas: Drop Rows, if group's size is larger than mean

Question

I want all grouped rows to be the same size. I.e either by removing the last rows or adding zeros if the group has a small size.

d = {'ID':['a12', 'a12','a12','a12','a12','b33','b33','b33','b33','v55','v55','v55','v55','v55','v55'], 'Exp_A':[2.2,2.2,2.2,2.2,2.2,3.1,3.1,3.1,3.1,1.5,1.5,1.5,1.5,1.5,1.5], 
     'Exp_B':[2.4,2.4,2.4,2.4,2.4,1.2,1.2,1.2,1.2,1.5,1.5,1.5,1.5,1.5,1.5], 
     'A':[0,0,1,0,1,0,1,0,1,0,1,1,1,0,1], 'B':[0,0,1,1,1,0,0,1,1,1,0,0,1,0,1]}
df1 = pd.DataFrame(data=d)

I want all df1.ID to be size df1.groupby('ID').size().mean(). So df1 should look like:

    A   B   Exp_A   Exp_B   ID
0   0   0   2.2      2.4    a12
1   0   0   2.2      2.4    a12
2   1   1   2.2      2.4    a12
3   0   1   2.2      2.4    a12
4   1   1   2.2      2.4    a12

5   0   0   3.1      1.2    b33
6   1   0   3.1      1.2    b33
7   0   1   3.1      1.2    b33
8   1   1   3.1      1.2    b33
9   0   0   3.1      1.2    b33

10  0   1   1.5      1.5    v55
11  1   0   1.5      1.5    v55
12  1   0   1.5      1.5    v55
13  1   1   1.5      1.5    v55
14  0   0   1.5      1.5    v55

jpp · Accepted Answer

Here's one solution using GroupBy. The complication arises with your condition to add extra rows with certain columns set to 0, whenever a particular group is too small.

g = df1.groupby('ID')
n = int(g.size().mean())

res = []
for _, df in g:
    k = len(df.index)
    excess = n - k
    if excess > 0:
        df = df.append(pd.concat([df.iloc[[-1]].assign(A=0, B=0)]*excess))
    res.append(df.iloc[:n])

res = pd.concat(res, ignore_index=True)

print(res)

    A  B  Exp_A  Exp_B   ID
0   0  0    2.2    2.4  a12
1   0  0    2.2    2.4  a12
2   1  1    2.2    2.4  a12
3   0  1    2.2    2.4  a12
4   1  1    2.2    2.4  a12
5   0  0    3.1    1.2  b33
6   1  0    3.1    1.2  b33
7   0  1    3.1    1.2  b33
8   1  1    3.1    1.2  b33
9   0  0    3.1    1.2  b33
10  0  1    1.5    1.5  v55
11  1  0    1.5    1.5  v55
12  1  0    1.5    1.5  v55
13  1  1    1.5    1.5  v55
14  0  0    1.5    1.5  v55

Pandas: Drop Rows, if group's size is larger than mean

Answers (2)

Related Questions

Pandas: Drop Rows, if group&#39;s size is larger than mean

Answers (2)

Related Questions

Pandas: Drop Rows, if group's size is larger than mean