Reputation: 1794
I'm not sure if "explode" is the right term here. I have a DataFrame with columns event_id
, num_steps
, and avg_time
.
I'd like to end up with a DataFrame where, for each event_id
, there are num_steps
rows with each row counting from 0 to num_steps
.
event_id, num_steps, avg_time
1, 3, 5
Should become:
event_id, num_steps, avg_time
1, 0, 5
1, 1, 5
1, 2, 5
Currently I'm iterating over the dataframe and creating this manually, but I'm wondering if there is any way to do this directly within Pandas to increase processing time?
Thanks!
Upvotes: 0
Views: 93
Reputation: 150735
This assumes the index is unique:
# toy data
df = pd.DataFrame({'event_id':[1,2],
'num_steps':[3,2],
'avg_time':[5,4]})
(df.loc[df.index.repeat(df['num_steps'])]
.assign(num_steps=lambda x: x.groupby(level=0).cumcount())
.reset_index(drop=True)
)
Output:
event_id num_steps avg_time
0 1 0 5
1 1 1 5
2 1 2 5
3 2 0 4
4 2 1 4
Upvotes: 1
Reputation: 323226
When we talking about the explode we do explode
, before that we need convert the max steps
to list
of steps .
Notice make sure your pandas version is later than 0.25.0
s=df.groupby(['event_id','avg_time']).num_steps.\
apply(lambda x : list(range(1,x.iloc[0]+1))).\
explode().reset_index()
Out[43]:
event_id avg_time num_steps
0 1 5 1
1 1 5 2
2 1 5 3
Upvotes: 1