Reputation: 101
I want all grouped rows to be the same size. I.e either by removing the last rows or adding zeros if the group has a small size.
d = {'ID':['a12', 'a12','a12','a12','a12','b33','b33','b33','b33','v55','v55','v55','v55','v55','v55'], 'Exp_A':[2.2,2.2,2.2,2.2,2.2,3.1,3.1,3.1,3.1,1.5,1.5,1.5,1.5,1.5,1.5],
'Exp_B':[2.4,2.4,2.4,2.4,2.4,1.2,1.2,1.2,1.2,1.5,1.5,1.5,1.5,1.5,1.5],
'A':[0,0,1,0,1,0,1,0,1,0,1,1,1,0,1], 'B':[0,0,1,1,1,0,0,1,1,1,0,0,1,0,1]}
df1 = pd.DataFrame(data=d)
I want all df1.ID
to be size df1.groupby('ID').size().mean()
.
So df1
should look like:
A B Exp_A Exp_B ID
0 0 0 2.2 2.4 a12
1 0 0 2.2 2.4 a12
2 1 1 2.2 2.4 a12
3 0 1 2.2 2.4 a12
4 1 1 2.2 2.4 a12
5 0 0 3.1 1.2 b33
6 1 0 3.1 1.2 b33
7 0 1 3.1 1.2 b33
8 1 1 3.1 1.2 b33
9 0 0 3.1 1.2 b33
10 0 1 1.5 1.5 v55
11 1 0 1.5 1.5 v55
12 1 0 1.5 1.5 v55
13 1 1 1.5 1.5 v55
14 0 0 1.5 1.5 v55
Upvotes: 2
Views: 535
Reputation: 1078
Here is a solution without looping. You can first determine the number of rows for each ID and then go about changing stuff.
# Getting the minimum required number of rows for each ID
min_req = df.groupby('ID').size().mean()
# Adding auto-increment column with respect to ID column
df['row_count'] = df.groupby(['ID']).cumcount()+1
# Adding excess rows equal to required rows
# we will delete unneeded ones later
df2 = df.groupby('ID', as_index=False).max()
df2 = df2.loc[df2['row_count']<int(min_req)]
df2 = df2.assign(A=0, B=0)
df = df.append([df2]*int(min_req), ignore_index=True)
# recalculating the count
df = df.drop('row_count', axis=1)
df = df.sort_values(by=['ID', 'A', 'B'], ascending=[True, False, False])
df['row_count'] = df.groupby(['ID']).cumcount()+1
# Dropping excess rows
df = df.drop((df.loc[df['row_count']>5]).index)
df = df.drop('row_count', axis=1)
df
A B Exp_A Exp_B ID
0 0 0 2.2 2.4 a12
1 0 0 2.2 2.4 a12
2 1 1 2.2 2.4 a12
3 0 1 2.2 2.4 a12
4 1 1 2.2 2.4 a12
17 0 0 3.1 1.2 b33
16 0 0 3.1 1.2 b33
15 0 0 3.1 1.2 b33
18 0 0 3.1 1.2 b33
19 0 0 3.1 1.2 b33
10 1 0 1.5 1.5 v55
11 1 0 1.5 1.5 v55
12 1 1 1.5 1.5 v55
13 0 0 1.5 1.5 v55
14 1 1 1.5 1.5 v55
Upvotes: 2
Reputation: 164773
Here's one solution using GroupBy
. The complication arises with your condition to add extra rows with certain columns set to 0, whenever a particular group is too small.
g = df1.groupby('ID')
n = int(g.size().mean())
res = []
for _, df in g:
k = len(df.index)
excess = n - k
if excess > 0:
df = df.append(pd.concat([df.iloc[[-1]].assign(A=0, B=0)]*excess))
res.append(df.iloc[:n])
res = pd.concat(res, ignore_index=True)
print(res)
A B Exp_A Exp_B ID
0 0 0 2.2 2.4 a12
1 0 0 2.2 2.4 a12
2 1 1 2.2 2.4 a12
3 0 1 2.2 2.4 a12
4 1 1 2.2 2.4 a12
5 0 0 3.1 1.2 b33
6 1 0 3.1 1.2 b33
7 0 1 3.1 1.2 b33
8 1 1 3.1 1.2 b33
9 0 0 3.1 1.2 b33
10 0 1 1.5 1.5 v55
11 1 0 1.5 1.5 v55
12 1 0 1.5 1.5 v55
13 1 1 1.5 1.5 v55
14 0 0 1.5 1.5 v55
Upvotes: 2