Reputation: 3568
Assuming we have a dataframe as below:
df = pd.DataFrame({ 'Col1' : ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c'],
'col2' : ['0.5', '0.78', '0.78', '0.4', '2', '9', '2', '7',]
})
I counted the number of rows for all the unique values in col1
. Like a
has 4 rows, b
and c
have 2 rows each, by doing:
df.groupby(['Col1']).size()
and I get the output as
Col1
a 4
b 2
c 2
dtype: int64
After this is done, I would like to check which among a, b, c has the maximum number of rows (in this case, a
has the maximum rows) and pad the others (b
and c
) with the difference between the the maximum value and the rows they have, with zeros (both b
and c
have 2 rows each, and since 4 is the maximum number of rows, I want to pad b
and c
with 2 more zeros). The zeros must be added at the end.
I want to pad it with zeros since I want to apply a window of fixed size on all the variables (a, b, c) to plot graphs.
Upvotes: 3
Views: 1025
Reputation: 323366
Same logic like Jez using cumcount
, but with stack
and unstack
chain
df.assign(key2=df.groupby('Col1').cumcount()).set_index(['Col1','key2']).unstack(fill_value=0).stack().reset_index('Col1')
Out[1047]:
Col1 col2
key2
0 a 0.5
1 a 0.78
2 a 0.78
3 a 0.4
0 b 2
1 b 9
2 b 0
3 b 0
0 c 2
1 c 7
2 c 0
3 c 0
Upvotes: 1
Reputation: 863501
You can create counter by GroupBy.cumcount
, create MultiIndex
and DataFrame.reindex
by all combinations created by MultiIndex.from_product
:
df1 = df.set_index(['Col1', df.groupby('Col1').cumcount()])
mux = pd.MultiIndex.from_product(df1.index.levels, names=df1.index.names)
df2 = df1.reindex(mux, fill_value=0).reset_index(level=1, drop=True).reset_index()
print (df2)
Col1 col2
0 a 0.5
1 a 0.78
2 a 0.78
3 a 0.4
4 b 2
5 b 9
6 b 0
7 b 0
8 c 2
9 c 7
10 c 0
11 c 0
Upvotes: 3