kevin.w.johnson
kevin.w.johnson

Reputation: 1794

Pandas explode row to variable number of rows

I'm not sure if "explode" is the right term here. I have a DataFrame with columns event_id, num_steps, and avg_time.

I'd like to end up with a DataFrame where, for each event_id, there are num_steps rows with each row counting from 0 to num_steps.

event_id, num_steps, avg_time

1, 3, 5

Should become:

event_id, num_steps, avg_time

1, 0, 5

1, 1, 5

1, 2, 5

Currently I'm iterating over the dataframe and creating this manually, but I'm wondering if there is any way to do this directly within Pandas to increase processing time?

Thanks!

Upvotes: 0

Views: 93

Answers (2)

Quang Hoang
Quang Hoang

Reputation: 150735

This assumes the index is unique:

# toy data
df = pd.DataFrame({'event_id':[1,2],
                  'num_steps':[3,2],
                  'avg_time':[5,4]})

(df.loc[df.index.repeat(df['num_steps'])]
   .assign(num_steps=lambda x: x.groupby(level=0).cumcount())
   .reset_index(drop=True)
)

Output:

   event_id  num_steps  avg_time
0         1          0         5
1         1          1         5
2         1          2         5
3         2          0         4
4         2          1         4

Upvotes: 1

BENY
BENY

Reputation: 323226

When we talking about the explode we do explode , before that we need convert the max steps to list of steps .

Notice make sure your pandas version is later than 0.25.0

s=df.groupby(['event_id','avg_time']).num_steps.\
      apply(lambda x : list(range(1,x.iloc[0]+1))).\
        explode().reset_index()
Out[43]: 
   event_id  avg_time num_steps
0         1         5         1
1         1         5         2
2         1         5         3

Upvotes: 1

Related Questions