Reputation: 3897
Given a pd.DataFrame
such as:
print(pd.DataFrame([['a', 0, 'b'], ['c', 1, 'd'], ['f', 4, 'e']]))
0 1 2
0 a 0 b
1 c 1 d
2 f 4 e
I would like to "fill in" rows by incrementing on the integer column. That is, I would like to obtain:
0 1 2
0 a 0 b
1 c 1 d
2 NaN 2 NaN
3 NaN 3 NaN
4 f 4 e
As I am will use this within a groupby
operation in a large dataset I am looking for the most efficient code to do this.
Upvotes: 0
Views: 1092
Reputation: 353209
You could turn your 1 column into an index and reindex using it:
In [33]: df.set_index(1).reindex(range(df[1].iloc[0], df[1].iloc[-1]+1)).reset_index()
Out[33]:
1 0 2
0 0 a b
1 1 c d
2 2 NaN NaN
3 3 NaN NaN
4 4 f e
and then you could reorder the columns if you cared.
Don't know about performance, but frankly custom groupby operations are pretty slow to start with. If speed is really critical, your best bet is to move this incrementing operation out of the groupby entirely if you can pull it off.
Upvotes: 2