Pandas explode row to variable number of rows

Question

I'm not sure if "explode" is the right term here. I have a DataFrame with columns event_id, num_steps, and avg_time.

I'd like to end up with a DataFrame where, for each event_id, there are num_steps rows with each row counting from 0 to num_steps.

event_id, num_steps, avg_time

1, 3, 5

Should become:

event_id, num_steps, avg_time

1, 0, 5

1, 1, 5

1, 2, 5

Currently I'm iterating over the dataframe and creating this manually, but I'm wondering if there is any way to do this directly within Pandas to increase processing time?

Thanks!

Quang Hoang · Accepted Answer

This assumes the index is unique:

# toy data
df = pd.DataFrame({'event_id':[1,2],
                  'num_steps':[3,2],
                  'avg_time':[5,4]})

(df.loc[df.index.repeat(df['num_steps'])]
   .assign(num_steps=lambda x: x.groupby(level=0).cumcount())
   .reset_index(drop=True)
)

Output:

   event_id  num_steps  avg_time
0         1          0         5
1         1          1         5
2         1          2         5
3         2          0         4
4         2          1         4

Pandas explode row to variable number of rows

Answers (2)

Related Questions